Ancestry May Confound Genetic Machine Learning: Candidate-Gene Prediction of Opioid Use Disorder as an Example
dc.contributor.author | Hatoum, Alexander S. | |
dc.contributor.author | Wendt, Frank R. | |
dc.contributor.author | Galimberti, Marco | |
dc.contributor.author | Polimanti, Renato | |
dc.contributor.author | Neale, Benjamin | |
dc.contributor.author | Kranzler, Henry R. | |
dc.contributor.author | Gelernter, Joel | |
dc.contributor.author | Edenberg, Howard J. | |
dc.contributor.author | Agrawal, Arpana | |
dc.contributor.department | Medical and Molecular Genetics, School of Medicine | |
dc.date.accessioned | 2023-10-12T09:57:13Z | |
dc.date.available | 2023-10-12T09:57:13Z | |
dc.date.issued | 2021 | |
dc.description.abstract | Background: Machine learning (ML) models are beginning to proliferate in psychiatry, however machine learning models in psychiatric genetics have not always accounted for ancestry. Using an empirical example of a proposed genetic test for OUD, and exploring a similar test for tobacco dependence and a simulated binary phenotype, we show that genetic prediction using ML is vulnerable to ancestral confounding. Methods: We utilize five ML algorithms trained with 16 brain reward-derived "candidate" SNPs proposed for commercial use and examine their ability to predict OUD vs. ancestry in an out-of-sample test set (N = 1000, stratified into equal groups of n = 250 cases and controls each of European and African ancestry). We rerun analyses with 8 random sets of allele-frequency matched SNPs. We contrast findings with 11 genome-wide significant variants for tobacco smoking. To document generalizability, we generate and test a random phenotype. Results: None of the 5 ML algorithms predict OUD better than chance when ancestry was balanced but were confounded with ancestry in an out-of-sample test. In addition, the algorithms preferentially predicted admixed subpopulations. Random sets of variants matched to the candidate SNPs by allele frequency produced similar bias. Genome-wide significant tobacco smoking variants were also confounded by ancestry. Finally, random SNPs predicting a random simulated phenotype show that the bias attributable to ancestral confounding could impact any ML-based genetic prediction. Conclusions: Researchers and clinicians are encouraged to be skeptical of claims of high prediction accuracy from ML-derived genetic algorithms for polygenic traits like addiction, particularly when using candidate variants. | |
dc.eprint.version | Author's manuscript | |
dc.identifier.citation | Hatoum AS, Wendt FR, Galimberti M, et al. Ancestry may confound genetic machine learning: Candidate-gene prediction of opioid use disorder as an example. Drug Alcohol Depend. 2021;229(Pt B):109115. doi:10.1016/j.drugalcdep.2021.109115 | |
dc.identifier.uri | https://hdl.handle.net/1805/36296 | |
dc.language.iso | en_US | |
dc.publisher | Elsevier | |
dc.relation.isversionof | 10.1016/j.drugalcdep.2021.109115 | |
dc.relation.journal | Drug and Alcohol Dependence | |
dc.rights | Publisher Policy | |
dc.source | PMC | |
dc.subject | Opioid use disorder | |
dc.subject | Machine learning | |
dc.subject | Algorithmic bias | |
dc.subject | Ancestry | |
dc.subject | Candidate genes | |
dc.title | Ancestry May Confound Genetic Machine Learning: Candidate-Gene Prediction of Opioid Use Disorder as an Example | |
dc.type | Article |