- Browse by Subject
Browsing by Subject "Cluster analysis"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item Combining Multivariate Statistical Methods and Spatial Analysis to Characterize Water Quality Conditions in the White River Basin, Indiana, U.S.A.(2011-02-25) Gamble, Andrew Stephan; Babbar-Sebens, Meghna; Tedesco, Lenore P.; Peng, HanxiangThis research performs a comparative study of techniques for combining spatial data and multivariate statistical methods for characterizing water quality conditions in a river basin. The study has been performed on the White River basin in central Indiana, and uses sixteen physical and chemical water quality parameters collected from 44 different monitoring sites, along with various spatial data related to land use – land cover, soil characteristics, terrain characteristics, eco-regions, etc. Various parameters related to the spatial data were analyzed using ArcHydro tools and were included in the multivariate analysis methods for the purpose of creating classification equations that relate spatial and spatio-temporal attributes of the watershed to water quality data at monitoring stations. The study compares the use of various statistical estimates (mean, geometric mean, trimmed mean, and median) of monitored water quality variables to represent annual and seasonal water quality conditions. The relationship between these estimates and the spatial data is then modeled via linear and non-linear multivariate methods. The linear statistical multivariate method uses a combination of principal component analysis, cluster analysis, and discriminant analysis, whereas the non-linear multivariate method uses a combination of Kohonen Self-Organizing Maps, Cluster Analysis, and Support Vector Machines. The final models were tested with recent and independent data collected from stations in the Eagle Creek watershed, within the White River basin. In 6 out of 20 models the Support Vector Machine more accurately classified the Eagle Creek stations, and in 2 out of 20 models the Linear Discriminant Analysis model achieved better results. Neither the linear or non-linear models had an apparent advantage for the remaining 12 models. This research provides an insight into the variability and uncertainty in the interpretation of the various statistical estimates and statistical models, when water quality monitoring data is combined with spatial data for characterizing general spatial and spatio-temporal trends.Item Delirium diagnosis defined by cluster analysis of symptoms versus diagnosis by DSM and ICD criteria: diagnostic accuracy study(BioMed Central, 2016-05-26) Sepulveda, Esteban; Franco, Jose G.; Trzepacz, Paula T.; Gaviria, Ana M.; Meagher, David J.; Palma, Jose; Viñuelas, Eva; Grau, Imma; Vilella, Elisabet; de Pablo, Joan; Department of Psychiatry, IU School of MedicineBACKGROUND: Information on validity and reliability of delirium criteria is necessary for clinicians, researchers, and further developments of DSM or ICD. We compare four DSM and ICD delirium diagnostic criteria versions, which were developed by consensus of experts, with a phenomenology-based natural diagnosis delineated using cluster analysis of delirium features in a sample with a high prevalence of dementia. We also measured inter-rater reliability of each system when applied by two evaluators from distinct disciplines. METHODS: Cross-sectional analysis of 200 consecutive patients admitted to a skilled nursing facility, independently assessed within 24-48 h after admission with the Delirium Rating Scale-Revised-98 (DRS-R98) and for DSM-III-R, DSM-IV, DSM-5, and ICD-10 criteria for delirium. Cluster analysis (CA) delineated natural delirium and nondelirium reference groups using DRS-R98 items and then diagnostic systems' performance were evaluated against the CA-defined groups using logistic regression and crosstabs for discriminant analysis (sensitivity, specificity, percentage of subjects correctly classified by each diagnostic system and their individual criteria, and performance for each system when excluding each individual criterion are reported). Kappa Index (K) was used to report inter-rater reliability for delirium diagnostic systems and their individual criteria. RESULTS: 117 (58.5 %) patients had preexisting dementia according to the Informant Questionnaire on Cognitive Decline in the Elderly. CA delineated 49 delirium subjects and 151 nondelirium. Against these CA groups, delirium diagnosis accuracy was highest using DSM-III-R (87.5 %) followed closely by DSM-IV (86.0 %), ICD-10 (85.5 %) and DSM-5 (84.5 %). ICD-10 had the highest specificity (96.0 %) but lowest sensitivity (53.1 %). DSM-III-R had the best sensitivity (81.6 %) and the best sensitivity-specificity balance. DSM-5 had the highest inter-rater reliability (K =0.73) while DSM-III-R criteria were the least reliable. CONCLUSIONS: Using our CA-defined, phenomenologically-based delirium designations as the reference standard, we found performance discordance among four diagnostic systems when tested in subjects where comorbid dementia was prevalent. The most complex diagnostic systems have higher accuracy and the newer DSM-5 have higher reliability. Our novel phenomenological approach to designing a delirium reference standard may be preferred to guide revisions of diagnostic systems in the future.Item IRIS-FGM: an integrative single-cell RNA-Seq interpretation system for functional gene module analysis(Oxford University Press, 2021) Chang, Yuzhou; Allen, Carter; Wan, Changlin; Chung, Dongjun; Zhang, Chi; Li, Zihai; Ma, Qin; Medical and Molecular Genetics, School of MedicineSummary: Single-cell RNA-Seq (scRNA-Seq) data is useful in discovering cell heterogeneity and signature genes in specific cell populations in cancer and other complex diseases. Specifically, the investigation of condition-specific functional gene modules (FGM) can help to understand interactive gene networks and complex biological processes in different cell clusters. QUBIC2 is recognized as one of the most efficient and effective biclustering tools for condition-specific FGM identification from scRNA-Seq data. However, its limited availability to a C implementation restricted its application to only a few downstream analysis functionalities. We developed an R package named IRIS-FGM (Integrative scRNA-Seq Interpretation System for Functional Gene Module analysis) to support the investigation of FGMs and cell clustering using scRNA-Seq data. Empowered by QUBIC2, IRIS-FGM can effectively identify condition-specific FGMs, predict cell types/clusters, uncover differentially expressed genes and perform pathway enrichment analysis. It is noteworthy that IRIS-FGM can also take Seurat objects as input, facilitating easy integration with the existing analysis pipeline. Availability and implementation: IRIS-FGM is implemented in the R environment (as of version 3.6) with the source code freely available at https://github.com/BMEngineeR/IRISFGM.Item Multivariate Statistical Methods Applied to the Analysis of Trace Evidence(2013-08-22) Szkudlarek, Cheryl Ann; Goodpaster, John V. (John Vincent); Picard, Christine; Siegel, Jay A.; Minto, RobertThe aim of this study was to use multivariate statistical techniques to: (1) determine the reproducibility of fiber evidence analyzed by MSP, (2) determine whether XRF is an appropriate technique for forensic tape analysis, and (3) determine if DART/MS is an appropriate technique for forensic tape analysis. This was achieved by employing several multivariate statistical techniques including agglomerative hierarchical clustering, principal component analysis, discriminant analysis, and analysis of variance. First, twelve dyed textile fibers were analyzed by UV-Visible MSP. This analysis included an inter-laboratory study, external validations, differing preprocessing techniques, and color coordinates. The inter-laboratory study showed no statistically significant difference between the different instruments. The external validations had overall acceptable results. Using first derivatives as a preprocessing technique and color coordinates to define color did not result in any additional information. Next, the tape backings of thirty-three brands were analyzed by XRF. After chemometric analysis it was concluded that the 3M tapes with black adhesive can be classified by brand except for Super 33+ (Cold Weather) and Super 88. The colorless adhesive tapes were separated into two large groups which were correlated with the presence of aluminosilicate filler. Overall, no additional discrimination was seen by using XRF compared to the traditional instrumentation for tape analysis previously published. Lastly, the backings of eighty-nine brands of tape were analyzed by DART/MS. The analysis of the black adhesive tapes showed that again discrimination between brands is possible except for Super 33+ and Super 88. However, now Tartan and Temflex have become indistinguishable. The colorless adhesive tapes again were more or less indistinguishable from one another with the exception of Tuff Hand Tool, Qualpack, and a roll of 3M Tartan, which were found to be unique. It cannot be determined if additional discrimination was achieved with DART/MS because the multivariate statistical techniques have not been applied to the other instrumental techniques used during tape analysis.Item Response to ‘Letter to the Editor: on the stability and internal consistency of component-wise sparse mixture regression based clustering’, Zhang et al.(Oxford University Press, 2022) Chang, Wennan; Zhang, Chi; Cao, Sha; Biostatistics and Health Data Science, School of Medicine