Addressing overfitting bias due to sample overlap in polygenic risk scoring

dc.contributor.authorJeong, Seokho
dc.contributor.authorShivakumar, Manu
dc.contributor.authorJung, Sang-Hyuk
dc.contributor.authorWon, Hong-Hee
dc.contributor.authorNho, Kwangsik
dc.contributor.authorHuang, Heng
dc.contributor.authorDavatzikos, Christos
dc.contributor.authorSaykin, Andrew J.
dc.contributor.authorThompson, Paul M.
dc.contributor.authorShen, Li
dc.contributor.authorKim, Young Jin
dc.contributor.authorKim, Bong-Jo
dc.contributor.authorLee, Seunggeun
dc.contributor.authorKim, Dokyoon
dc.contributor.departmentRadiology and Imaging Sciences, School of Medicine
dc.date.accessioned2025-05-20T09:33:42Z
dc.date.available2025-05-20T09:33:42Z
dc.date.issued2025
dc.description.abstractIntroduction: Numerous studies on Alzheimer's disease polygenic risk scores (PRSs) overlook sample overlap between International Genomics of Alzheimer's Project (IGAP) and target datasets like Alzheimer's Disease Neuroimaging Initiative (ADNI). Methods: To address this, we developed overlap-adjusted PRS (OA PRS) and tested it on simulated data to assess biases from different scenarios by varying training, testing, and overlap proportions. OA PRS was used to adjust for sample bias in simulations; then, we applied OA PRS to IGAP and ADNI datasets and validated through visual diagnosis. Results: OA PRS effectively adjusted for sample overlap in all simulation scenarios, as well as for IGAP and ADNI. The original IGAP PRS showed an inflated area under the receiver operating characteristic (AUROC: 0.915) on overlapping samples. OA PRS reduced the AUROC to 0.726, closely aligning with the AUROC of non-overlapping samples (0.712). Further, visual diagnostics confirmed the effectiveness of our adjustments. Discussion: With OA PRS, we were able to adjust the IGAP summary-based PRS for the overlapped ADNI samples, allowing the dataset to be fully used without the risk of overfitting. Highlights: Sample overlap between large Alzheimer's disease (AD) cohorts poses overfitting bias when using AD polygenic risk scores (PRSs). This study highlighted the effectiveness of overlap-adjusted PRS (OA -PRS) in mitigating overfitting and improving the accuracy of PRS estimations. New PRSs based on adjusted effect sizes showed increased power in association with clinical features.
dc.eprint.versionFinal published version
dc.identifier.citationJeong S, Shivakumar M, Jung SH, et al. Addressing overfitting bias due to sample overlap in polygenic risk scoring. Alzheimers Dement. 2025;21(4):e70109. doi:10.1002/alz.70109
dc.identifier.urihttps://hdl.handle.net/1805/48253
dc.language.isoen_US
dc.publisherWiley
dc.relation.isversionof10.1002/alz.70109
dc.relation.journalAlzheimer's & Dementia
dc.rightsAttribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.sourcePMC
dc.subjectAlzheimer's disease
dc.subjectGenetic risk factor
dc.subjectPolygenic risk scores
dc.subjectPrecision medicine
dc.subjectSample overlap
dc.titleAddressing overfitting bias due to sample overlap in polygenic risk scoring
dc.typeArticle
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Jeong2025Addressing-CCBY.pdf
Size:
1.61 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.04 KB
Format:
Item-specific license agreed upon to submission
Description: