Addressing overfitting bias due to sample overlap in polygenic risk scoring

Date
2025
Language
American English
Embargo Lift Date
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Wiley
Can't use the file because of accessibility barriers? Contact us with the title of the item, permanent link, and specifics of your accommodation need.
Abstract

Introduction: Numerous studies on Alzheimer's disease polygenic risk scores (PRSs) overlook sample overlap between International Genomics of Alzheimer's Project (IGAP) and target datasets like Alzheimer's Disease Neuroimaging Initiative (ADNI).

Methods: To address this, we developed overlap-adjusted PRS (OA PRS) and tested it on simulated data to assess biases from different scenarios by varying training, testing, and overlap proportions. OA PRS was used to adjust for sample bias in simulations; then, we applied OA PRS to IGAP and ADNI datasets and validated through visual diagnosis.

Results: OA PRS effectively adjusted for sample overlap in all simulation scenarios, as well as for IGAP and ADNI. The original IGAP PRS showed an inflated area under the receiver operating characteristic (AUROC: 0.915) on overlapping samples. OA PRS reduced the AUROC to 0.726, closely aligning with the AUROC of non-overlapping samples (0.712). Further, visual diagnostics confirmed the effectiveness of our adjustments.

Discussion: With OA PRS, we were able to adjust the IGAP summary-based PRS for the overlapped ADNI samples, allowing the dataset to be fully used without the risk of overfitting.

Highlights: Sample overlap between large Alzheimer's disease (AD) cohorts poses overfitting bias when using AD polygenic risk scores (PRSs). This study highlighted the effectiveness of overlap-adjusted PRS (OA -PRS) in mitigating overfitting and improving the accuracy of PRS estimations. New PRSs based on adjusted effect sizes showed increased power in association with clinical features.

Description
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
Jeong S, Shivakumar M, Jung SH, et al. Addressing overfitting bias due to sample overlap in polygenic risk scoring. Alzheimers Dement. 2025;21(4):e70109. doi:10.1002/alz.70109
ISSN
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
Alzheimer's & Dementia
Source
PMC
Alternative Title
Type
Article
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Final published version
Full Text Available at
This item is under embargo {{howLong}}