Explicit Modeling of Ancestry Improves Polygenic Risk Scores and BLUP Prediction

Polygenic prediction using genome-wide SNPs can provide high prediction accuracy for complex traits. Here, we investigate the question of how to account for genetic ancestry when conducting polygenic prediction. We show that the accuracy of polygenic prediction in structured populations may be partly due to genetic ancestry. However, we hypothesized that explicitly modeling ancestry could improve polygenic prediction accuracy. We analyzed three GWAS of hair color (HC), tanning ability (TA), and basal cell carcinoma (BCC) in European Americans (sample size from 7,440 to 9,822) and considered two widely used polygenic prediction approaches: polygenic risk scores (PRSs) and best linear unbiased prediction (BLUP). We compared polygenic prediction without correction for ancestry to polygenic prediction with ancestry as a separate component in the model. In 10-fold cross-validation using the PRS approach, the R(2) for HC increased by 66% (0.0456-0.0755; P < 10(-16)), the R(2) for TA increased by 123% (0.0154 to 0.0344; P < 10(-16)), and the liability-scale R(2) for BCC increased by 68% (0.0138-0.0232; P < 10(-16)) when explicitly modeling ancestry, which prevents ancestry effects from entering into each SNP effect and being overweighted. Surprisingly, explicitly modeling ancestry produces a similar improvement when using the BLUP approach, which fits all SNPs simultaneously in a single variance component and causes ancestry to be underweighted. We validate our findings via simulations, which show that the differences in prediction accuracy will increase in magnitude as sample sizes increase. In summary, our results show that explicitly modeling ancestry can be important in both PRS and BLUP prediction.

Keywords

Basal cell carcinoma, Genome-wide association study, Pigmentation, Polygenic prediction, Principal component analysis

Cite As

Chen, C.-Y., Han, J., Hunter, D. J., Kraft, P., & Price, A. L. (2015). Explicit modeling of ancestry improves polygenic risk scores and BLUP prediction. Genetic Epidemiology, 39(6), 427–438. http://doi.org/10.1002/gepi.21906

Journal

Genetic Epidemiology

Rights

Publisher Policy

Source

PMC

Type

Article

Permanent Link

https://hdl.handle.net/1805/12500

DOI

https://doi.org/10.1002/gepi.21906

Version

Author's manuscript

Collections

Open Access Policy Articles
Department of Epidemiology Works

Full item page