Sufficient principal component regression for pattern discovery in transcriptomic data

dc.contributor.authorDing, Lei
dc.contributor.authorZentner, Gabriel E.
dc.contributor.authorMcDonald, Daniel J.
dc.contributor.departmentBiology, School of Scienceen_US
dc.date.accessioned2023-07-10T14:36:46Z
dc.date.available2023-07-10T14:36:46Z
dc.date.issued2022-05-14
dc.description.abstractMotivation: Methods for the global measurement of transcript abundance such as microarrays and RNA-Seq generate datasets in which the number of measured features far exceeds the number of observations. Extracting biologically meaningful and experimentally tractable insights from such data therefore requires high-dimensional prediction. Existing sparse linear approaches to this challenge have been stunningly successful, but some important issues remain. These methods can fail to select the correct features, predict poorly relative to non-sparse alternatives or ignore any unknown grouping structures for the features. Results: We propose a method called SuffPCR that yields improved predictions in high-dimensional tasks including regression and classification, especially in the typical context of omics with correlated features. SuffPCR first estimates sparse principal components and then estimates a linear model on the recovered subspace. Because the estimated subspace is sparse in the features, the resulting predictions will depend on only a small subset of genes. SuffPCR works well on a variety of simulated and experimental transcriptomic data, performing nearly optimally when the model assumptions are satisfied. We also demonstrate near-optimal theoretical guarantees. Availability and implementation: Code and raw data are freely available at https://github.com/dajmcdon/suffpcr. Package documentation may be viewed at https://dajmcdon.github.io/suffpcr.en_US
dc.eprint.versionFinal published versionen_US
dc.identifier.citationDing L, Zentner GE, McDonald DJ. Sufficient principal component regression for pattern discovery in transcriptomic data. Bioinform Adv. 2022;2(1):vbac033. Published 2022 May 14. doi:10.1093/bioadv/vbac033en_US
dc.identifier.urihttps://hdl.handle.net/1805/34278
dc.language.isoen_USen_US
dc.publisherOxford University Pressen_US
dc.relation.isversionof10.1093/bioadv/vbac033en_US
dc.relation.journalBioinformatics Advancesen_US
dc.rightsAttribution 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.sourcePMCen_US
dc.subjectMicroarraysen_US
dc.subjectRNA-sequencingen_US
dc.subjectPredictionen_US
dc.subjectRegressionen_US
dc.subjectClassificationen_US
dc.subjectTranscriptomic dataen_US
dc.titleSufficient principal component regression for pattern discovery in transcriptomic dataen_US
dc.typeArticleen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
vbac033.pdf
Size:
461.85 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: