Principled distillation of UK Biobank phenotype data reveals underlying structure in human variation

dc.contributor.authorCarey, Caitlin E.
dc.contributor.authorShafee, Rebecca
dc.contributor.authorWedow, Robbee
dc.contributor.authorElliott, Amanda
dc.contributor.authorPalmer, Duncan S.
dc.contributor.authorCompitello, John
dc.contributor.authorKanai, Masahiro
dc.contributor.authorAbbott, Liam
dc.contributor.authorSchultz, Patrick
dc.contributor.authorKarczewski, Konrad J.
dc.contributor.authorBryant, Samuel C.
dc.contributor.authorCusick, Caroline M.
dc.contributor.authorChurchhouse, Claire
dc.contributor.authorHowrigan, Daniel P.
dc.contributor.authorKing, Daniel
dc.contributor.authorSmith, George Davey
dc.contributor.authorNeale, Benjamin M.
dc.contributor.authorWalters, Raymond K.
dc.contributor.authorRobinson, Elise B.
dc.contributor.departmentMedical and Molecular Genetics, School of Medicine
dc.date.accessioned2024-10-15T08:42:32Z
dc.date.available2024-10-15T08:42:32Z
dc.date.issued2024
dc.description.abstractData within biobanks capture broad yet detailed indices of human variation, but biobank-wide insights can be difficult to extract due to complexity and scale. Here, using large-scale factor analysis, we distill hundreds of variables (diagnoses, assessments and survey items) into 35 latent constructs, using data from unrelated individuals with predominantly estimated European genetic ancestry in UK Biobank. These factors recapitulate known disease classifications, disentangle elements of socioeconomic status, highlight the relevance of psychiatric constructs to health and improve measurement of pro-health behaviours. We go on to demonstrate the power of this approach to clarify genetic signal, enhance discovery and identify associations between underlying phenotypic structure and health outcomes. In building a deeper understanding of ways in which constructs such as socioeconomic status, trauma, or physical activity are structured in the dataset, we emphasize the importance of considering the interwoven nature of the human phenome when evaluating public health patterns.
dc.eprint.versionFinal published version
dc.identifier.citationCarey CE, Shafee R, Wedow R, et al. Principled distillation of UK Biobank phenotype data reveals underlying structure in human variation. Nat Hum Behav. 2024;8(8):1599-1615. doi:10.1038/s41562-024-01909-5
dc.identifier.urihttps://hdl.handle.net/1805/43941
dc.language.isoen_US
dc.publisherSpringer Nature
dc.relation.isversionof10.1038/s41562-024-01909-5
dc.relation.journalNature Human Behaviour
dc.rightsAttribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.sourcePMC
dc.subjectEpidemiology
dc.subjectData integration
dc.subjectStatistical methods
dc.titlePrincipled distillation of UK Biobank phenotype data reveals underlying structure in human variation
dc.typeArticle
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Carey2024Principled-CCBY.pdf
Size:
2.31 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.04 KB
Format:
Item-specific license agreed upon to submission
Description: