Unsupervised representation learning improves genomic discovery and risk prediction for respiratory and circulatory functions and diseases

dc.contributor.authorYun, Taedong
dc.contributor.authorCosentino, Justin
dc.contributor.authorBehsaz, Babak
dc.contributor.authorMcCaw, Zachary R.
dc.contributor.authorHill, Davin
dc.contributor.authorLuben, Robert
dc.contributor.authorLai, Dongbing
dc.contributor.authorBates, John
dc.contributor.authorYang, Howard
dc.contributor.authorSchwantes-An, Tae-Hwi
dc.contributor.authorZhou, Yuchen
dc.contributor.authorKhawaja, Anthony P.
dc.contributor.authorCarroll, Andrew
dc.contributor.authorHobbs, Brian D.
dc.contributor.authorCho, Michael H.
dc.contributor.authorMcLean, Cory Y.
dc.contributor.authorHormozdiari, Farhad
dc.contributor.departmentMedical and Molecular Genetics, School of Medicine
dc.date.accessioned2024-01-02T14:57:41Z
dc.date.available2024-01-02T14:57:41Z
dc.date.issued2023-08-29
dc.description.abstractHigh-dimensional clinical data are becoming more accessible in biobank-scale datasets. However, effectively utilizing high-dimensional clinical data for genetic discovery remains challenging. Here we introduce a general deep learning-based framework, REpresentation learning for Genetic discovery on Low-dimensional Embeddings (REGLE), for discovering associations between genetic variants and high-dimensional clinical data. REGLE uses convolutional variational autoencoders to compute a non-linear, low-dimensional, disentangled embedding of the data with highly heritable individual components. REGLE can incorporate expert-defined or clinical features and provides a framework to create accurate disease-specific polygenic risk scores (PRS) in datasets which have minimal expert phenotyping. We apply REGLE to both respiratory and circulatory systems: spirograms which measure lung function and photoplethysmograms (PPG) which measure blood volume changes. Genome-wide association studies on REGLE embeddings identify more genome-wide significant loci than existing methods and replicate known loci for both spirograms and PPG, demonstrating the generality of the framework. Furthermore, these embeddings are associated with overall survival. Finally, we construct a set of PRSs that improve predictive performance of asthma, chronic obstructive pulmonary disease, hypertension, and systolic blood pressure in multiple biobanks. Thus, REGLE embeddings can quantify clinically relevant features that are not currently captured in a standardized or automated way.
dc.eprint.versionPre-Print
dc.identifier.citationYun T, Cosentino J, Behsaz B, et al. Unsupervised representation learning improves genomic discovery and risk prediction for respiratory and circulatory functions and diseases. Preprint. medRxiv. 2023;2023.04.28.23289285. Published 2023 Aug 29. doi:10.1101/2023.04.28.23289285
dc.identifier.urihttps://hdl.handle.net/1805/37535
dc.language.isoen_US
dc.publishermedRxiv
dc.relation.isversionof10.1101/2023.04.28.23289285
dc.rightsAttribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.sourcePMC
dc.subjectBiobank-scale datasets
dc.subjectGenetic discovery
dc.subjectHigh-dimensional clinical data
dc.titleUnsupervised representation learning improves genomic discovery and risk prediction for respiratory and circulatory functions and diseases
dc.typeArticle
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
nihpp-2023.04.28.23289285v2.pdf
Size:
2.56 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: