Discovery and Interpretation of Subspace Structures in Omics Data by Low-Rank Representation

Lu, Xiaoyu

Discovery and Interpretation of Subspace Structures in Omics Data by Low-Rank Representation

Files

Lu_iupui_0104D_10628.pdf (5.25 MB)

Date

2022-10

Authors

Lu, Xiaoyu

Language

American English

Committee Chair

Cao, Sha

Committee Members

Zhang, Chi
Yan, Jingwen
Zang, Yong

Degree

Ph.D.

Degree Year

2022

Department

School of Informatics & Computing

Grantor

Indiana University

Abstract

Biological functions in cells are highly complicated and heterogenous, and can be reflected by omics data, such as gene expression levels. Detecting subspace structures in omics data and understanding the diversity of the biological processes is essential to the full comprehension of biological mechanisms and complicated biological systems. In this thesis, we are developing novel statistical learning approaches to reveal the subspace structures in omics data. Specifically, we focus on three types of subspace structures: low-rank subspace, sparse subspace and covariates explainable subspace. For low-rank subspace, we developed a semi-supervised model SSMD to detect cell type specific low-rank structures and predict their relative proportions across different tissue samples. SSMD is the first computational tool that utilizes semi-supervised identification of cell types and their marker genes specific to each mouse tissue transcriptomics data, for better understanding of the disease microenvironment and downstream disease mechanism. For sparsity-driven sparse subspace, we proposed a novel positive and unlabeled learning model, namely PLUS, that could identify cancer metastasis related genes, predict cancer metastasis status and specifically address the under-diagnosis issue in studying metastasis potential. We found PLUS predicted metastasis potential at diagnosis have significantly strong association with patient’s progression-free survival in their follow-up data. Lastly, to discover the covariates explainable subspace, we proposed an analytical pipeline based on covariance regression, namely, scCovReg. We utilized scCovReg to detect the pathway level second-order variations using scRNA-Seq data in a statistically powerful manner, and to associate the second-order variations with important subject-level characteristics, such as disease status. In conclusion, we presented a set of state-of-the-art computational solutions for identifying sparse subspaces in omics data, which promise to provide insights into the mechanism in complex diseases.

Description

Indiana University-Purdue University Indianapolis (IUPUI)

Keywords

Bioinformatics, Computational Biology, Low-Rank Representation, Subspace

Rights

Type

Thesis

Permanent Link

https://hdl.handle.net/1805/30484
http://dx.doi.org/10.7912/C2/3053

Collections

Informatics School Theses and Dissertations

Full item page