CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models

Thistlethwaite, Lillian R.; Petrosyan, Varduhi; Li, Xiqi; Miller, Marcus J.; Elsea, Sarah H.; Milosavljevic, Aleksandar

CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models

dc.contributor.author	Thistlethwaite, Lillian R.
dc.contributor.author	Petrosyan, Varduhi
dc.contributor.author	Li, Xiqi
dc.contributor.author	Miller, Marcus J.
dc.contributor.author	Elsea, Sarah H.
dc.contributor.author	Milosavljevic, Aleksandar
dc.contributor.department	Medical and Molecular Genetics, School of Medicine	en_US
dc.date.accessioned	2022-05-27T10:17:57Z
dc.date.available	2022-05-27T10:17:57Z
dc.date.issued	2021-01
dc.description.abstract	We consider the following general family of algorithmic problems that arises in transcriptomics, metabolomics and other fields: given a weighted graph G and a subset of its nodes S, find subsets of S that show significant connectedness within G. A specific solution to this problem may be defined by devising a scoring function, the Maximum Clique problem being a classic example, where S includes all nodes in G and where the score is defined by the size of the largest subset of S fully connected within G. Major practical obstacles for the plethora of algorithms addressing this type of problem include computational efficiency and, particularly for more complex scores which take edge weights into account, the computational cost of permutation testing, a statistical procedure required to obtain a bound on the p-value for a connectedness score. To address these problems, we developed CTD, "Connect the Dots", a fast algorithm based on data compression that detects highly connected subsets within S. CTD provides information-theoretic upper bounds on p-values when S contains a small fraction of nodes in G without requiring computationally costly permutation testing. We apply the CTD algorithm to interpret multi-metabolite perturbations due to inborn errors of metabolism and multi-transcript perturbations associated with breast cancer in the context of disease-specific Gaussian Markov Random Field networks learned directly from respective molecular profiling data.	en_US
dc.eprint.version	Final published version	en_US
dc.identifier.citation	Thistlethwaite LR, Petrosyan V, Li X, Miller MJ, Elsea SH, Milosavljevic A. CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models [published correction appears in PLoS Comput Biol. 2021 Oct 25;17(10):e1009551]. PLoS Comput Biol. 2021;17(1):e1008550. Published 2021 Jan 29. doi:10.1371/journal.pcbi.1008550	en_US
dc.identifier.uri	https://hdl.handle.net/1805/29158
dc.language.iso	en_US	en_US
dc.publisher	PLOS	en_US
dc.relation.isversionof	10.1371/journal.pcbi.1008550	en_US
dc.relation.journal	PLOS COMPUTATIONAL BIOLOGY	en_US
dc.rights	Attribution 4.0 International	*
dc.rights.uri	https://creativecommons.org/licenses/by/4.0	*
dc.source	PMC	en_US
dc.subject	Gene Expression Profiling	en_US
dc.subject	Metabolome	en_US
dc.subject	Transcriptome	en_US
dc.title	CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: pcbi.1008550.pdf
Size:: 4.39 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.99 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Open Access Policy Articles
Department of Medical and Molecular Genetics Works