Supervised clustering of high-dimensional data using regularized mixture modeling

dc.contributor.authorChang, Wennan
dc.contributor.authorWan, Changlin
dc.contributor.authorZang, Yong
dc.contributor.authorZhang, Chi
dc.contributor.authorCao, Sha
dc.contributor.departmentMedical and Molecular Genetics, School of Medicineen_US
dc.date.accessioned2023-04-12T14:26:36Z
dc.date.available2023-04-12T14:26:36Z
dc.date.issued2021-07-20
dc.description.abstractIdentifying relationships between genetic variations and their clinical presentations has been challenged by the heterogeneous causes of a disease. It is imperative to unveil the relationship between the high-dimensional genetic manifestations and the clinical presentations, while taking into account the possible heterogeneity of the study subjects. We proposed a novel supervised clustering algorithm using penalized mixture regression model, called component-wise sparse mixture regression (CSMR), to deal with the challenges in studying the heterogeneous relationships between high-dimensional genetic features and a phenotype. The algorithm was adapted from the classification expectation maximization algorithm, which offers a novel supervised solution to the clustering problem, with substantial improvement on both the computational efficiency and biological interpretability. Experimental evaluation on simulated benchmark datasets demonstrated that the CSMR can accurately identify the subspaces on which subset of features are explanatory to the response variables, and it outperformed the baseline methods. Application of CSMR on a drug sensitivity dataset again demonstrated the superior performance of CSMR over the others, where CSMR is powerful in recapitulating the distinct subgroups hidden in the pool of cell lines with regards to their coping mechanisms to different drugs. CSMR represents a big data analysis tool with the potential to resolve the complexity of translating the clinical representations of the disease to the real causes underpinning it. We believe that it will bring new understanding to the molecular basis of a disease and could be of special relevance in the growing field of personalized medicine.en_US
dc.eprint.versionFinal published versionen_US
dc.identifier.citationChang W, Wan C, Zang Y, Zhang C, Cao S. Supervised clustering of high-dimensional data using regularized mixture modeling. Brief Bioinform. 2021;22(4):bbaa291. doi:10.1093/bib/bbaa291en_US
dc.identifier.urihttps://hdl.handle.net/1805/32345
dc.language.isoen_USen_US
dc.publisherOxford University Pressen_US
dc.relation.isversionof10.1093/bib/bbaa291en_US
dc.relation.journalBriefings in Bioinformaticsen_US
dc.rightsPublisher Policyen_US
dc.sourcePMCen_US
dc.subjectSupervised learningen_US
dc.subjectMixture modelingen_US
dc.subjectDisease heterogeneityen_US
dc.titleSupervised clustering of high-dimensional data using regularized mixture modelingen_US
dc.typeArticleen_US
ul.alternative.fulltexthttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8294591/en_US
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
bbaa291.pdf
Size:
819.71 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: