Variational Autoencoder-based Model Improves Polygenic Prediction in Blood Cell Traits

Li, Xiaoqi; Pang, Minxing; Wen, Jia; Zhou, Laura Y.; Raffield, Laura M.; Zhou, Haibo; Yao, Huaxiu; Chen, Can; Sun, Quan; Li, Yun

Variational Autoencoder-based Model Improves Polygenic Prediction in Blood Cell Traits

dc.contributor.author	Li, Xiaoqi
dc.contributor.author	Pang, Minxing
dc.contributor.author	Wen, Jia
dc.contributor.author	Zhou, Laura Y.
dc.contributor.author	Raffield, Laura M.
dc.contributor.author	Zhou, Haibo
dc.contributor.author	Yao, Huaxiu
dc.contributor.author	Chen, Can
dc.contributor.author	Sun, Quan
dc.contributor.author	Li, Yun
dc.contributor.department	Biostatistics and Health Data Science, Richard M. Fairbanks School of Public Health
dc.date.accessioned	2025-02-20T11:45:14Z
dc.date.available	2025-02-20T11:45:14Z
dc.date.issued	2025-01-18
dc.description.abstract	Genetic prediction of complex traits, enabled by large-scale genomic studies, has created new measures to understand individual genetic predisposition. Polygenic Risk Scores (PRS) offer a way to aggregate information across the genome, enabling personalized risk prediction for complex traits and diseases. However, conventional PRS calculation methods that rely on linear models are limited in their ability to capture complex patterns and interaction effects in high-dimensional genomic data. In this study, we seek to improve the predictive power of PRS through applying advanced deep learning techniques. We show that the Variational AutoEncoder-based model for PRS construction (VAE-PRS) outperforms currently state-of-the-art methods for biobank-level data in 14 out of 16 blood cell traits, while being computationally efficient. Through comprehensive experiments, we found that the VAE-PRS model offers the ability to capture interaction effects in high-dimensional data and shows robust performance across different pre-screened variant sets. Furthermore, VAE-PRS is easily interpretable via assessing the contribution of each individual marker to the final prediction score through the SHapley Additive exPlanations (SHAP) method, providing potential new insights in identifying trait-associated genetic variants. In summary, VAE-PRS presents a novel measure to genetic risk prediction by harnessing the power of deep learning methods, which could further facilitate the development of personalized medicine and genetic research.
dc.eprint.version	Preprint
dc.identifier.citation	Li X, Pang M, Wen J, et al. Variational Autoencoder-based Model Improves Polygenic Prediction in Blood Cell Traits. Preprint. bioRxiv. 2025;2025.01.13.632820. Published 2025 Jan 18. doi:10.1101/2025.01.13.632820
dc.identifier.uri	https://hdl.handle.net/1805/45874
dc.language.iso	en_US
dc.publisher	bioRxiv
dc.relation.isversionof	10.1101/2025.01.13.632820
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International	en
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0
dc.source	PMC
dc.subject	Genetic predisposition
dc.subject	Polygenic Risk Scores (PRS)
dc.subject	Personalized risk prediction
dc.title	Variational Autoencoder-based Model Improves Polygenic Prediction in Blood Cell Traits
dc.type	Article

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Li2025Variational-CCBYNCND.pdf
Size:: 1.66 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.04 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Open Access Policy Articles
Biostatistics and Health Data Science Works