- Browse by Author
Browsing by Author "McDonald, Daniel J."
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Flexible analysis of TSS mapping data and detection of TSS shifts with TSRexploreR(Oxford University Press, 2021-06) Policastro, Robert A.; McDonald, Daniel J.; Brendel, Volker P.; Zentner, Gabriel E.; Biology, School of ScienceHeterogeneity in transcription initiation has important consequences for transcript stability and translation, and shifts in transcription start site (TSS) usage are prevalent in various developmental, metabolic, and disease contexts. Accordingly, numerous methods for global TSS profiling have been developed, including most recently Survey of TRanscription Initiation at Promoter Elements with high-throughput sequencing (STRIPE-seq), a method to profile transcription start sites (TSSs) on a genome-wide scale with significant cost and time savings compared to previous methods. In anticipation of more widespread adoption of STRIPE-seq and related methods for construction of promoter atlases and studies of differential gene expression, we built TSRexploreR, an R package for end-to-end analysis of TSS mapping data. TSRexploreR provides functions for TSS and transcription start region (TSR) detection, normalization, correlation, visualization, and differential TSS/TSR analyses. TSRexploreR is highly interoperable, accepting the data structures of TSS and TSR sets generated by several existing tools for processing and alignment of TSS mapping data, such as CAGEr for Cap Analysis of Gene Expression (CAGE) data. Lastly, TSRexploreR implements a novel approach for the detection of shifts in TSS distribution.Item Sufficient principal component regression for pattern discovery in transcriptomic data(Oxford University Press, 2022-05-14) Ding, Lei; Zentner, Gabriel E.; McDonald, Daniel J.; Biology, School of ScienceMotivation: Methods for the global measurement of transcript abundance such as microarrays and RNA-Seq generate datasets in which the number of measured features far exceeds the number of observations. Extracting biologically meaningful and experimentally tractable insights from such data therefore requires high-dimensional prediction. Existing sparse linear approaches to this challenge have been stunningly successful, but some important issues remain. These methods can fail to select the correct features, predict poorly relative to non-sparse alternatives or ignore any unknown grouping structures for the features. Results: We propose a method called SuffPCR that yields improved predictions in high-dimensional tasks including regression and classification, especially in the typical context of omics with correlated features. SuffPCR first estimates sparse principal components and then estimates a linear model on the recovered subspace. Because the estimated subspace is sparse in the features, the resulting predictions will depend on only a small subset of genes. SuffPCR works well on a variety of simulated and experimental transcriptomic data, performing nearly optimally when the model assumptions are satisfied. We also demonstrate near-optimal theoretical guarantees. Availability and implementation: Code and raw data are freely available at https://github.com/dajmcdon/suffpcr. Package documentation may be viewed at https://dajmcdon.github.io/suffpcr.