CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models

If you need an accessible version of this item, please submit a remediation request.
Date
2021-01
Language
American English
Embargo Lift Date
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
PLOS
Abstract

We consider the following general family of algorithmic problems that arises in transcriptomics, metabolomics and other fields: given a weighted graph G and a subset of its nodes S, find subsets of S that show significant connectedness within G. A specific solution to this problem may be defined by devising a scoring function, the Maximum Clique problem being a classic example, where S includes all nodes in G and where the score is defined by the size of the largest subset of S fully connected within G. Major practical obstacles for the plethora of algorithms addressing this type of problem include computational efficiency and, particularly for more complex scores which take edge weights into account, the computational cost of permutation testing, a statistical procedure required to obtain a bound on the p-value for a connectedness score. To address these problems, we developed CTD, "Connect the Dots", a fast algorithm based on data compression that detects highly connected subsets within S. CTD provides information-theoretic upper bounds on p-values when S contains a small fraction of nodes in G without requiring computationally costly permutation testing. We apply the CTD algorithm to interpret multi-metabolite perturbations due to inborn errors of metabolism and multi-transcript perturbations associated with breast cancer in the context of disease-specific Gaussian Markov Random Field networks learned directly from respective molecular profiling data.

Description
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
Thistlethwaite LR, Petrosyan V, Li X, Miller MJ, Elsea SH, Milosavljevic A. CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models [published correction appears in PLoS Comput Biol. 2021 Oct 25;17(10):e1009551]. PLoS Comput Biol. 2021;17(1):e1008550. Published 2021 Jan 29. doi:10.1371/journal.pcbi.1008550
ISSN
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
PLOS COMPUTATIONAL BIOLOGY
Source
PMC
Alternative Title
Type
Article
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Final published version
Full Text Available at
This item is under embargo {{howLong}}