Machine learning models incorporating genotype and ancestry improve severe asthma risk prediction

Date
2025-11-17
Language
American English
Embargo Lift Date
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Springer Nature
Can't use the file because of accessibility barriers? Contact us with the title of the item, permanent link, and specifics of your accommodation need.
Abstract

This study proposes a novel machine learning (ML)-based stacking technique that integrates Single Nucleotide Polymorphisms (SNPs) and inferred local ancestry (LA) to improve predictive accuracy in clinical outcomes. Asthma, particularly severe asthma (SA) with poor response to inhaled corticosteroids (ICS), serves as the case study to illustrate this approach. Using data from the Biorepository and Integrative Genomics (BIG) Initiative, which includes whole-exome sequenced data from a self-reported African American pediatric cohort (N=248), we develop an ML framework to predict ICS response. After SNP data preprocessing and LA estimation, we employ stratified 10-fold cross-validation, creating base pipelines for SNP and LA data, which are then combined in stacked pipelines to assess the effectiveness of integrating these distinct data types. The stacked SNP pipeline yields an AUC of 0.693 ± 0.066 and the stacked LA pipeline yields an AUC of 0.625 ± 0.103. The integration of LA with SNP data significantly improves predictive performance, boosting the AUC to 0.729 ± 0.048 (paired t-test p-value = 0.005). Pipelines using LA data alone shows comparable performance to those using SNP data alone. However, the most important contributing features are distinct between LA and SNP data demonstrating that these data types capture distinct sources of variation and could provide complementary insights. This study highlights the potential of stacking ML pipelines, based on feature selection techniques and along with logistic regression and random forest predictive models, to integrate SNP and LA data. Such holistic approach has the promise to improve predictive performance of medication response in complex conditions like SA. This approach has broader implications for advancing personalized medicine through the effective use of multifactorial data.

Description
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
Tahmin N, Chinthala LK, Marsico FL, et al. Machine learning models incorporating genotype and ancestry improve severe asthma risk prediction. Sci Rep. 2025;15(1):40243. Published 2025 Nov 17. doi:10.1038/s41598-025-24080-x
ISSN
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
Scientific Reports
Source
PMC
Alternative Title
Type
Article
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Final published version
Full Text Available at
This item is under embargo {{howLong}}