Linkage Disequilibrium-Informed Deep Learning Framework to Identify Genetic Loci for Alzheimer’s Disease Using Whole Genome Sequencing Data

dc.contributor.authorJo, Taeho
dc.contributor.authorBice, Paula
dc.contributor.authorNho, Kwangsik
dc.contributor.authorSaykin, Andrew J.
dc.contributor.authorAlzheimer’s Disease Sequencing Project
dc.contributor.departmentRadiology and Imaging Sciences, School of Medicine
dc.date.accessioned2024-11-11T17:36:40Z
dc.date.available2024-11-11T17:36:40Z
dc.date.issued2024-09-22
dc.description.abstractThe exponential growth of genomic datasets necessitates advanced analytical tools to effectively identify genetic loci from large-scale high throughput sequencing data. This study presents Deep-Block, a multi-stage deep learning framework that incorporates biological knowledge into its AI architecture to identify genetic regions as significantly associated with Alzheimer's disease (AD). The framework employs a three-stage approach: (1) genome segmentation based on linkage disequilibrium (LD) patterns, (2) selection of relevant LD blocks using sparse attention mechanisms, and (3) application of TabNet and Random Forest algorithms to quantify single nucleotide polymorphism (SNP) feature importance, thereby identifying genetic factors contributing to AD risk. The Deep-Block was applied to a large-scale whole genome sequencing (WGS) dataset from the Alzheimer's Disease Sequencing Project (ADSP), comprising 7,416 non-Hispanic white participants (3,150 cognitively normal older adults (CN), 4,266 AD). First, 30,218 LD blocks were identified and then ranked based on their relevance with Alzheimer's disease. Subsequently, the Deep-Block identified novel SNPs within the top 1,500 LD blocks and confirmed previously known variants, including APOE rs429358 and rs769449. The results were cross-validated against established AD-associated loci from the European Alzheimer's and Dementia Biobank (EADB) and the GWAS catalog. The Deep-Block framework effectively processes large-scale high throughput sequencing data while preserving interactions between SNPs in performing the dimensionality reduction, which can potentially introduce bias or lead to information loss. The Deep-Block approach identified both known and novel genetic variation, enhancing our understanding of the genetic architecture of and demonstrating the framework's potential for application in large-scale sequencing studies.
dc.eprint.versionPreprint
dc.identifier.citationJo T, Bice P, Nho K, Saykin AJ; Alzheimer’s Disease Sequencing Project. Linkage Disequilibrium-Informed Deep Learning Framework to Identify Genetic Loci for Alzheimer's Disease Using Whole Genome Sequencing Data. Preprint. medRxiv. 2024;2024.09.19.24313993. Published 2024 Sep 22. doi:10.1101/2024.09.19.24313993
dc.identifier.urihttps://hdl.handle.net/1805/44473
dc.language.isoen_US
dc.publishermedRxiv
dc.relation.isversionof10.1101/2024.09.19.24313993
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internationalen
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0
dc.sourcePMC
dc.subjectGenetic loci
dc.subjectDeep-Block
dc.subjectAlzheimer's disease (AD)
dc.titleLinkage Disequilibrium-Informed Deep Learning Framework to Identify Genetic Loci for Alzheimer’s Disease Using Whole Genome Sequencing Data
dc.typeArticle
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Jo2024Linkage-CCBYNCND.pdf
Size:
1.21 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.04 KB
Format:
Item-specific license agreed upon to submission
Description: