Deep learning-based identification of genetic variants: application to Alzheimer’s disease classification

dc.contributor.authorJo, Taeho
dc.contributor.authorNho, Kwangsik
dc.contributor.authorBice, Paula
dc.contributor.authorSaykin, Andrew J.
dc.contributor.departmentAlzheimer’s Disease Neuroimaging Initiative
dc.date.accessioned2023-10-24T15:32:04Z
dc.date.available2023-10-24T15:32:04Z
dc.date.issued2022
dc.description.abstractDeep learning is a promising tool that uses nonlinear transformations to extract features from high-dimensional data. Deep learning is challenging in genome-wide association studies (GWAS) with high-dimensional genomic data. Here we propose a novel three-step approach (SWAT-CNN) for identification of genetic variants using deep learning to identify phenotype-related single nucleotide polymorphisms (SNPs) that can be applied to develop accurate disease classification models. In the first step, we divided the whole genome into nonoverlapping fragments of an optimal size and then ran convolutional neural network (CNN) on each fragment to select phenotype-associated fragments. In the second step, using a Sliding Window Association Test (SWAT), we ran CNN on the selected fragments to calculate phenotype influence scores (PIS) and identify phenotype-associated SNPs based on PIS. In the third step, we ran CNN on all identified SNPs to develop a classification model. We tested our approach using GWAS data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) including (N = 981; cognitively normal older adults (CN) = 650 and AD = 331). Our approach identified the well-known APOE region as the most significant genetic locus for AD. Our classification model achieved an area under the curve (AUC) of 0.82, which was compatible with traditional machine learning approaches, random forest and XGBoost. SWAT-CNN, a novel deep learning-based genome-wide approach, identified AD-associated SNPs and a classification model for AD and may hold promise for a range of biomedical applications.
dc.eprint.versionFinal published version
dc.identifier.citationJo T, Nho K, Bice P, Saykin AJ; Alzheimer’s Disease Neuroimaging Initiative. Deep learning-based identification of genetic variants: application to Alzheimer's disease classification. Brief Bioinform. 2022;23(2):bbac022. doi:10.1093/bib/bbac022
dc.identifier.urihttps://hdl.handle.net/1805/36605
dc.language.isoen_US
dc.publisherOxford University Press
dc.relation.isversionof10.1093/bib/bbac022
dc.relation.journalBriefings in Bioinformatics
dc.rightsPublisher Policy
dc.sourcePMC
dc.subjectAlzheimer’s disease
dc.subjectDeep learning
dc.subjectGenetic variants
dc.subjectGenome-wide association studies
dc.subjectPhenotype influence scores
dc.titleDeep learning-based identification of genetic variants: application to Alzheimer’s disease classification
dc.typeArticle
ul.alternative.fulltexthttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8921609/
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
bbac022.pdf
Size:
1.01 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: