- Browse by Subject
Browsing by Subject "Cluster analysis"
Now showing 1 - 10 of 10
Results Per Page
Sort Options
Item A dynamic time order network for time-series gene expression data analysis(Springer Nature, 2012) Zhang, Pengyue; Mourad, Raphaël; Xiang, Yang; Huang, Kun; Huang, Tim; Nephew, Kenneth; Liu, Yunlong; Li, Lang; Center for Computational Biology and Bioinformatics, School of MedicineBackground: Typical analysis of time-series gene expression data such as clustering or graphical models cannot distinguish between early and later drug responsive gene targets in cancer cells. However, these genes would represent good candidate biomarkers. Results: We propose a new model - the dynamic time order network - to distinguish and connect early and later drug responsive gene targets. This network is constructed based on an integrated differential equation. Spline regression is applied for an accurate modeling of the time variation of gene expressions. Then a likelihood ratio test is implemented to infer the time order of any gene expression pair. One application of the model is the discovery of estrogen response biomarkers. For this purpose, we focused on genes whose responses are late when the breast cancer cells are treated with estradiol (E2). Conclusions: Our approach has been validated by successfully finding time order relations between genes of the cell cycle system. More notably, we found late response genes potentially interesting as biomarkers of E2 treatment.Item Clustering individuals using INMTD: a novel versatile multi-view embedding framework integrating omics and imaging data(Oxford University Press, 2025) Li, Zuqi; Windels, Sam F. L.; Malod-Dognin, Noël; Weinberg, Seth M.; Marazita, Mary L.; Walsh, Susan; Shriver, Mark D.; Fardo, David W.; Claes, Peter; Pržulj, Nataša; Van Steen, Kristel; Biology, School of ScienceMotivation: Combining omics and images can lead to a more comprehensive clustering of individuals than classic single-view approaches. Among the various approaches for multi-view clustering, nonnegative matrix tri-factorization (NMTF) and nonnegative Tucker decomposition (NTD) are advantageous in learning low-rank embeddings with promising interpretability. Besides, there is a need to handle unwanted drivers of clusterings (i.e. confounders). Results: In this work, we introduce a novel multi-view clustering method based on NMTF and NTD, named INMTD, which integrates omics and 3D imaging data to derive unconfounded subgroups of individuals. According to the adjusted Rand index, INMTD outperformed other clustering methods on a synthetic dataset with known clusters. In the application to real-life facial-genomic data, INMTD generated biologically relevant embeddings for individuals, genetics, and facial morphology. By removing confounded embedding vectors, we derived an unconfounded clustering with better internal and external quality; the genetic and facial annotations of each derived subgroup highlighted distinctive characteristics. In conclusion, INMTD can effectively integrate omics data and 3D images for unconfounded clustering with biologically meaningful interpretation. Availability and implementation: INMTD is freely available at https://github.com/ZuqiLi/INMTD.Item Combining Multivariate Statistical Methods and Spatial Analysis to Characterize Water Quality Conditions in the White River Basin, Indiana, U.S.A.(2011-02-25) Gamble, Andrew Stephan; Babbar-Sebens, Meghna; Tedesco, Lenore P.; Peng, HanxiangThis research performs a comparative study of techniques for combining spatial data and multivariate statistical methods for characterizing water quality conditions in a river basin. The study has been performed on the White River basin in central Indiana, and uses sixteen physical and chemical water quality parameters collected from 44 different monitoring sites, along with various spatial data related to land use – land cover, soil characteristics, terrain characteristics, eco-regions, etc. Various parameters related to the spatial data were analyzed using ArcHydro tools and were included in the multivariate analysis methods for the purpose of creating classification equations that relate spatial and spatio-temporal attributes of the watershed to water quality data at monitoring stations. The study compares the use of various statistical estimates (mean, geometric mean, trimmed mean, and median) of monitored water quality variables to represent annual and seasonal water quality conditions. The relationship between these estimates and the spatial data is then modeled via linear and non-linear multivariate methods. The linear statistical multivariate method uses a combination of principal component analysis, cluster analysis, and discriminant analysis, whereas the non-linear multivariate method uses a combination of Kohonen Self-Organizing Maps, Cluster Analysis, and Support Vector Machines. The final models were tested with recent and independent data collected from stations in the Eagle Creek watershed, within the White River basin. In 6 out of 20 models the Support Vector Machine more accurately classified the Eagle Creek stations, and in 2 out of 20 models the Linear Discriminant Analysis model achieved better results. Neither the linear or non-linear models had an apparent advantage for the remaining 12 models. This research provides an insight into the variability and uncertainty in the interpretation of the various statistical estimates and statistical models, when water quality monitoring data is combined with spatial data for characterizing general spatial and spatio-temporal trends.Item Delirium diagnosis defined by cluster analysis of symptoms versus diagnosis by DSM and ICD criteria: diagnostic accuracy study(BioMed Central, 2016-05-26) Sepulveda, Esteban; Franco, Jose G.; Trzepacz, Paula T.; Gaviria, Ana M.; Meagher, David J.; Palma, Jose; Viñuelas, Eva; Grau, Imma; Vilella, Elisabet; de Pablo, Joan; Department of Psychiatry, IU School of MedicineBACKGROUND: Information on validity and reliability of delirium criteria is necessary for clinicians, researchers, and further developments of DSM or ICD. We compare four DSM and ICD delirium diagnostic criteria versions, which were developed by consensus of experts, with a phenomenology-based natural diagnosis delineated using cluster analysis of delirium features in a sample with a high prevalence of dementia. We also measured inter-rater reliability of each system when applied by two evaluators from distinct disciplines. METHODS: Cross-sectional analysis of 200 consecutive patients admitted to a skilled nursing facility, independently assessed within 24-48 h after admission with the Delirium Rating Scale-Revised-98 (DRS-R98) and for DSM-III-R, DSM-IV, DSM-5, and ICD-10 criteria for delirium. Cluster analysis (CA) delineated natural delirium and nondelirium reference groups using DRS-R98 items and then diagnostic systems' performance were evaluated against the CA-defined groups using logistic regression and crosstabs for discriminant analysis (sensitivity, specificity, percentage of subjects correctly classified by each diagnostic system and their individual criteria, and performance for each system when excluding each individual criterion are reported). Kappa Index (K) was used to report inter-rater reliability for delirium diagnostic systems and their individual criteria. RESULTS: 117 (58.5 %) patients had preexisting dementia according to the Informant Questionnaire on Cognitive Decline in the Elderly. CA delineated 49 delirium subjects and 151 nondelirium. Against these CA groups, delirium diagnosis accuracy was highest using DSM-III-R (87.5 %) followed closely by DSM-IV (86.0 %), ICD-10 (85.5 %) and DSM-5 (84.5 %). ICD-10 had the highest specificity (96.0 %) but lowest sensitivity (53.1 %). DSM-III-R had the best sensitivity (81.6 %) and the best sensitivity-specificity balance. DSM-5 had the highest inter-rater reliability (K =0.73) while DSM-III-R criteria were the least reliable. CONCLUSIONS: Using our CA-defined, phenomenologically-based delirium designations as the reference standard, we found performance discordance among four diagnostic systems when tested in subjects where comorbid dementia was prevalent. The most complex diagnostic systems have higher accuracy and the newer DSM-5 have higher reliability. Our novel phenomenological approach to designing a delirium reference standard may be preferred to guide revisions of diagnostic systems in the future.Item Designing a Natural Experiment to Evaluate a National Health Care–Community Partnership to Prevent Type 2 Diabetes(CDC, 2013) Ackermann, Ronald T.; Holmes, Ann M.; Saha, Chandan; Health Policy and Management, Richard M. Fairbanks School of Public HealthTo address the growing incidence of type 2 diabetes in the United States, UnitedHealth Group, the YMCA of the USA, and the Centers for Disease Control and Prevention have partnered to bring a group-based adaptation of the Diabetes Prevention Program lifestyle intervention to a national scale. Researchers at Northwestern and Indiana universities are collaborating with these partners to design a robust evaluation of the reach, effectiveness, and costs of this natural experiment. We will employ a quasi-experimental, cluster-randomized study design and combine administrative, clinical, and programmatic data from existing sources to derive reliable, timely, and policy-relevant estimates of the program's impact and potential for sustainability. In this context, evaluation results will provide information about the unique role of a health care-community partnership to prevent type 2 diabetes.Item Endotyping Chronic Rhinosinusitis Based on Olfactory Cleft Mucus Biomarkers(Elsevier, 2021) Soler, Zachary M.; Schlosser, Rodney J.; Bodner, Todd E.; Alt, Jeremiah A.; Ramakrishnan, Vijay R.; Mattos, Jose L.; Mulligan, Jennifer K.; Mace, Jess C.; Smith, Timothy L.; Otolaryngology -- Head and Neck Surgery, School of MedicineBackground: Although chronic rhinosinusitis (CRS) is considered the most treatable form of olfactory dysfunction, there has been relatively little clinical attention focused on assessing endotypes as they pertain to olfactory loss. Objectives: The goal of this study was to explore inflammatory endotypes in CRS using an unsupervised cluster analysis of olfactory cleft (OC) biomarkers in a phenotype-free approach. Methods: Patients with CRS were prospectively recruited and psychophysical olfactory testing, Questionnaire of Olfactory Dysfunction (QOD-NS), and bilateral OC endoscopy were obtained. Mucus was collected from the OC and evaluated for 26 biomarkers using principal component analysis. Cluster analysis was performed using only OC biomarkers and differences in olfactory measures were compared across clusters. Results: A total of 198 subjects (128 with CRS and 70 controls) were evaluated. Evaluation of OC biomarkers indicated 6 principal components, explaining 69.50% of the variance, with type 2, mixed type 1/Th17-cell, growth factor, and neutrophil chemoattractant inflammatory signatures. A total of 10 clusters were identified that differed significantly in frequency of controls, and subjects with CRS with nasal polyps, and subjects with CRS without nasal polyps across the clusters (likelihood ratio test, χ182=178.64; P < .001). Olfactory measures differed significantly across clusters, including olfactory testing, QOD-NS, and OC endoscopy (P < .001 for all). Conclusions: Clustering based solely on OC biomarkers can organize patients into clinically meaningful endotypes that discriminate between subjects with CRS and controls. Validation studies are necessary to confirm these findings and further refine olfactory endotypes.Item IRIS-FGM: an integrative single-cell RNA-Seq interpretation system for functional gene module analysis(Oxford University Press, 2021) Chang, Yuzhou; Allen, Carter; Wan, Changlin; Chung, Dongjun; Zhang, Chi; Li, Zihai; Ma, Qin; Medical and Molecular Genetics, School of MedicineSummary: Single-cell RNA-Seq (scRNA-Seq) data is useful in discovering cell heterogeneity and signature genes in specific cell populations in cancer and other complex diseases. Specifically, the investigation of condition-specific functional gene modules (FGM) can help to understand interactive gene networks and complex biological processes in different cell clusters. QUBIC2 is recognized as one of the most efficient and effective biclustering tools for condition-specific FGM identification from scRNA-Seq data. However, its limited availability to a C implementation restricted its application to only a few downstream analysis functionalities. We developed an R package named IRIS-FGM (Integrative scRNA-Seq Interpretation System for Functional Gene Module analysis) to support the investigation of FGMs and cell clustering using scRNA-Seq data. Empowered by QUBIC2, IRIS-FGM can effectively identify condition-specific FGMs, predict cell types/clusters, uncover differentially expressed genes and perform pathway enrichment analysis. It is noteworthy that IRIS-FGM can also take Seurat objects as input, facilitating easy integration with the existing analysis pipeline. Availability and implementation: IRIS-FGM is implemented in the R environment (as of version 3.6) with the source code freely available at https://github.com/BMEngineeR/IRISFGM.Item Multivariate Statistical Methods Applied to the Analysis of Trace Evidence(2013-08-22) Szkudlarek, Cheryl Ann; Goodpaster, John V. (John Vincent); Picard, Christine; Siegel, Jay A.; Minto, RobertThe aim of this study was to use multivariate statistical techniques to: (1) determine the reproducibility of fiber evidence analyzed by MSP, (2) determine whether XRF is an appropriate technique for forensic tape analysis, and (3) determine if DART/MS is an appropriate technique for forensic tape analysis. This was achieved by employing several multivariate statistical techniques including agglomerative hierarchical clustering, principal component analysis, discriminant analysis, and analysis of variance. First, twelve dyed textile fibers were analyzed by UV-Visible MSP. This analysis included an inter-laboratory study, external validations, differing preprocessing techniques, and color coordinates. The inter-laboratory study showed no statistically significant difference between the different instruments. The external validations had overall acceptable results. Using first derivatives as a preprocessing technique and color coordinates to define color did not result in any additional information. Next, the tape backings of thirty-three brands were analyzed by XRF. After chemometric analysis it was concluded that the 3M tapes with black adhesive can be classified by brand except for Super 33+ (Cold Weather) and Super 88. The colorless adhesive tapes were separated into two large groups which were correlated with the presence of aluminosilicate filler. Overall, no additional discrimination was seen by using XRF compared to the traditional instrumentation for tape analysis previously published. Lastly, the backings of eighty-nine brands of tape were analyzed by DART/MS. The analysis of the black adhesive tapes showed that again discrimination between brands is possible except for Super 33+ and Super 88. However, now Tartan and Temflex have become indistinguishable. The colorless adhesive tapes again were more or less indistinguishable from one another with the exception of Tuff Hand Tool, Qualpack, and a roll of 3M Tartan, which were found to be unique. It cannot be determined if additional discrimination was achieved with DART/MS because the multivariate statistical techniques have not been applied to the other instrumental techniques used during tape analysis.Item Response to ‘Letter to the Editor: on the stability and internal consistency of component-wise sparse mixture regression based clustering’, Zhang et al.(Oxford University Press, 2022) Chang, Wennan; Zhang, Chi; Cao, Sha; Biostatistics and Health Data Science, School of MedicineItem scGNN 2.0: a graph neural network tool for imputation and clustering of single-cell RNA-Seq data(Oxford University Press, 2022) Gu, Haocheng; Cheng, Hao; Ma, Anjun; Li, Yang; Wang, Juexin; Xu, Dong; Ma, Qin; Biostatistics and Health Data Science, Richard M. Fairbanks School of Public HealthMotivation: Gene expression imputation has been an essential step of the single-cell RNA-Seq data analysis workflow. Among several deep-learning methods, the debut of scGNN gained substantial recognition in 2021 for its superior performance and the ability to produce a cell-cell graph. However, the implementation of scGNN was relatively time-consuming and its performance could still be optimized. Results: The implementation of scGNN 2.0 is significantly faster than scGNN thanks to a simplified close-loop architecture. For all eight datasets, cell clustering performance was increased by 85.02% on average in terms of adjusted rand index, and the imputation Median L1 Error was reduced by 67.94% on average. With the built-in visualizations, users can quickly assess the imputation and cell clustering results, compare against benchmarks and interpret the cell-cell interaction. The expanded input and output formats also pave the way for custom workflows that integrate scGNN 2.0 with other scRNA-Seq toolkits on both Python and R platforms. Availability and implementation: scGNN 2.0 is implemented in Python (as of version 3.8) with the source code available at https://github.com/OSU-BMBL/scGNN2.0.