Discovery and Interpretation of Subspace Structures in Omics Data by Low-Rank Representation

dc.contributor.advisorCao, Sha
dc.contributor.authorLu, Xiaoyu
dc.contributor.otherZhang, Chi
dc.contributor.otherYan, Jingwen
dc.contributor.otherZang, Yong
dc.date.accessioned2022-11-08T13:41:08Z
dc.date.available2022-11-08T13:41:08Z
dc.date.issued2022-10
dc.degree.date2022en_US
dc.degree.disciplineSchool of Informatics & Computing
dc.degree.grantorIndiana Universityen_US
dc.degree.levelPh.D.en_US
dc.descriptionIndiana University-Purdue University Indianapolis (IUPUI)en_US
dc.description.abstractBiological functions in cells are highly complicated and heterogenous, and can be reflected by omics data, such as gene expression levels. Detecting subspace structures in omics data and understanding the diversity of the biological processes is essential to the full comprehension of biological mechanisms and complicated biological systems. In this thesis, we are developing novel statistical learning approaches to reveal the subspace structures in omics data. Specifically, we focus on three types of subspace structures: low-rank subspace, sparse subspace and covariates explainable subspace. For low-rank subspace, we developed a semi-supervised model SSMD to detect cell type specific low-rank structures and predict their relative proportions across different tissue samples. SSMD is the first computational tool that utilizes semi-supervised identification of cell types and their marker genes specific to each mouse tissue transcriptomics data, for better understanding of the disease microenvironment and downstream disease mechanism. For sparsity-driven sparse subspace, we proposed a novel positive and unlabeled learning model, namely PLUS, that could identify cancer metastasis related genes, predict cancer metastasis status and specifically address the under-diagnosis issue in studying metastasis potential. We found PLUS predicted metastasis potential at diagnosis have significantly strong association with patient’s progression-free survival in their follow-up data. Lastly, to discover the covariates explainable subspace, we proposed an analytical pipeline based on covariance regression, namely, scCovReg. We utilized scCovReg to detect the pathway level second-order variations using scRNA-Seq data in a statistically powerful manner, and to associate the second-order variations with important subject-level characteristics, such as disease status. In conclusion, we presented a set of state-of-the-art computational solutions for identifying sparse subspaces in omics data, which promise to provide insights into the mechanism in complex diseases.en_US
dc.identifier.urihttps://hdl.handle.net/1805/30484
dc.identifier.urihttp://dx.doi.org/10.7912/C2/3053
dc.language.isoen_USen_US
dc.subjectBioinformaticsen_US
dc.subjectComputational Biologyen_US
dc.subjectLow-Rank Representationen_US
dc.subjectSubspaceen_US
dc.titleDiscovery and Interpretation of Subspace Structures in Omics Data by Low-Rank Representationen_US
dc.typeThesis
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Lu_iupui_0104D_10628.pdf
Size:
5.25 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: