Sparse Latent-Space Learning for High-Dimensional Data: Extensions and Applications

White, Alexander James

Sparse Latent-Space Learning for High-Dimensional Data: Extensions and Applications

Files

White_iupui_0104D_10676.pdf (10.66 MB)

Date

2023-05

Authors

White, Alexander James

Language

American English

Committee Chair

Cao, Sha

Committee Members

Tu, Wanzhu
Zhang, Chi
Zhao, Yi

Degree

Ph.D.

Degree Year

2023

Department

Biostatistics

Grantor

Indiana University

Abstract

The successful treatment and potential eradication of many complex diseases, such as cancer, begins with elucidating the convoluted mapping of molecular profiles to phenotypical manifestation. Our observed molecular profiles (e.g., genomics, transcriptomics, epigenomics) are often high-dimensional and are collected from patient samples falling into heterogeneous disease subtypes. Interpretable learning from such data calls for sparsity-driven models. This dissertation addresses the high dimensionality, sparsity, and heterogeneity issues when analyzing multiple-omics data, where each method is implemented with a concomitant R package. First, we examine challenges in submatrix identification, which aims to find subgroups of samples that behave similarly across a subset of features. We resolve issues such as two-way sparsity, non-orthogonality, and parameter tuning with an adaptive thresholding procedure on the singular vectors computed via orthogonal iteration. We validate the method with simulation analysis and apply it to an Alzheimer’s disease dataset. The second project focuses on modeling relationships between large, matched datasets. Exploring regressional structures between large data sets can provide insights such as the effect of long-range epigenetic influences on gene expression. We present a high-dimensional version of mixture multivariate regression to detect patient clusters, each with different correlation structures of matched-omics datasets. Results are validated via simulation and applied to matched-omics data sets. In the third project, we introduce a novel approach to modeling spatial transcriptomics (ST) data with a spatially penalized multinomial model of the expression counts. This method solves the low-rank structures of zero-inflated ST data with spatial smoothness constraints. We validate the model using manual cell structure annotations of human brain samples. We then applied this technique to additional ST datasets.

Description

Indiana University-Purdue University Indianapolis (IUPUI)

Keywords

Applied high dimensional statistics, Computational statistics, Latent, Low rank structure, Sparse, Spatial transcriptomics

Rights

Type

Thesis

Permanent Link

https://hdl.handle.net/1805/33295
http://dx.doi.org/10.7912/C2/3150

This item is under embargo {{howLong}}

2025-05-22

Collections

Biostatistics Department Theses and Dissertations

Full item page