Sparse Latent-Space Learning for High-Dimensional Data: Extensions and Applications

dc.contributor.advisorCao, Sha
dc.contributor.authorWhite, Alexander James
dc.contributor.otherTu, Wanzhu
dc.contributor.otherZhang, Chi
dc.contributor.otherZhao, Yi
dc.date.accessioned2023-05-24T19:42:45Z
dc.date.available2023-05-24T19:42:45Z
dc.date.issued2023-05
dc.degree.date2023en_US
dc.degree.discipline
dc.degree.grantorIndiana Universityen_US
dc.degree.levelPh.D.en_US
dc.descriptionIndiana University-Purdue University Indianapolis (IUPUI)en_US
dc.description.abstractThe successful treatment and potential eradication of many complex diseases, such as cancer, begins with elucidating the convoluted mapping of molecular profiles to phenotypical manifestation. Our observed molecular profiles (e.g., genomics, transcriptomics, epigenomics) are often high-dimensional and are collected from patient samples falling into heterogeneous disease subtypes. Interpretable learning from such data calls for sparsity-driven models. This dissertation addresses the high dimensionality, sparsity, and heterogeneity issues when analyzing multiple-omics data, where each method is implemented with a concomitant R package. First, we examine challenges in submatrix identification, which aims to find subgroups of samples that behave similarly across a subset of features. We resolve issues such as two-way sparsity, non-orthogonality, and parameter tuning with an adaptive thresholding procedure on the singular vectors computed via orthogonal iteration. We validate the method with simulation analysis and apply it to an Alzheimer’s disease dataset. The second project focuses on modeling relationships between large, matched datasets. Exploring regressional structures between large data sets can provide insights such as the effect of long-range epigenetic influences on gene expression. We present a high-dimensional version of mixture multivariate regression to detect patient clusters, each with different correlation structures of matched-omics datasets. Results are validated via simulation and applied to matched-omics data sets. In the third project, we introduce a novel approach to modeling spatial transcriptomics (ST) data with a spatially penalized multinomial model of the expression counts. This method solves the low-rank structures of zero-inflated ST data with spatial smoothness constraints. We validate the model using manual cell structure annotations of human brain samples. We then applied this technique to additional ST datasets.en_US
dc.description.embargo2025-05-22
dc.identifier.urihttps://hdl.handle.net/1805/33295
dc.identifier.urihttp://dx.doi.org/10.7912/C2/3150
dc.language.isoen_USen_US
dc.subjectApplied high dimensional statisticsen_US
dc.subjectComputational statisticsen_US
dc.subjectLatenten_US
dc.subjectLow rank structureen_US
dc.subjectSparseen_US
dc.subjectSpatial transcriptomicsen_US
dc.titleSparse Latent-Space Learning for High-Dimensional Data: Extensions and Applicationsen_US
dc.typeDissertation
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
White_iupui_0104D_10676.pdf
Size:
10.66 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: