Ancestry May Confound Genetic Machine Learning: Candidate-Gene Prediction of Opioid Use Disorder as an Example

Hatoum, Alexander S.; Wendt, Frank R.; Galimberti, Marco; Polimanti, Renato; Neale, Benjamin; Kranzler, Henry R.; Gelernter, Joel; Edenberg, Howard J.; Agrawal, Arpana

Ancestry May Confound Genetic Machine Learning: Candidate-Gene Prediction of Opioid Use Disorder as an Example

dc.contributor.author	Hatoum, Alexander S.
dc.contributor.author	Wendt, Frank R.
dc.contributor.author	Galimberti, Marco
dc.contributor.author	Polimanti, Renato
dc.contributor.author	Neale, Benjamin
dc.contributor.author	Kranzler, Henry R.
dc.contributor.author	Gelernter, Joel
dc.contributor.author	Edenberg, Howard J.
dc.contributor.author	Agrawal, Arpana
dc.contributor.department	Medical and Molecular Genetics, School of Medicine
dc.date.accessioned	2023-10-12T09:57:13Z
dc.date.available	2023-10-12T09:57:13Z
dc.date.issued	2021
dc.description.abstract	Background: Machine learning (ML) models are beginning to proliferate in psychiatry, however machine learning models in psychiatric genetics have not always accounted for ancestry. Using an empirical example of a proposed genetic test for OUD, and exploring a similar test for tobacco dependence and a simulated binary phenotype, we show that genetic prediction using ML is vulnerable to ancestral confounding. Methods: We utilize five ML algorithms trained with 16 brain reward-derived "candidate" SNPs proposed for commercial use and examine their ability to predict OUD vs. ancestry in an out-of-sample test set (N = 1000, stratified into equal groups of n = 250 cases and controls each of European and African ancestry). We rerun analyses with 8 random sets of allele-frequency matched SNPs. We contrast findings with 11 genome-wide significant variants for tobacco smoking. To document generalizability, we generate and test a random phenotype. Results: None of the 5 ML algorithms predict OUD better than chance when ancestry was balanced but were confounded with ancestry in an out-of-sample test. In addition, the algorithms preferentially predicted admixed subpopulations. Random sets of variants matched to the candidate SNPs by allele frequency produced similar bias. Genome-wide significant tobacco smoking variants were also confounded by ancestry. Finally, random SNPs predicting a random simulated phenotype show that the bias attributable to ancestral confounding could impact any ML-based genetic prediction. Conclusions: Researchers and clinicians are encouraged to be skeptical of claims of high prediction accuracy from ML-derived genetic algorithms for polygenic traits like addiction, particularly when using candidate variants.
dc.eprint.version	Author's manuscript
dc.identifier.citation	Hatoum AS, Wendt FR, Galimberti M, et al. Ancestry may confound genetic machine learning: Candidate-gene prediction of opioid use disorder as an example. Drug Alcohol Depend. 2021;229(Pt B):109115. doi:10.1016/j.drugalcdep.2021.109115
dc.identifier.uri	https://hdl.handle.net/1805/36296
dc.language.iso	en_US
dc.publisher	Elsevier
dc.relation.isversionof	10.1016/j.drugalcdep.2021.109115
dc.relation.journal	Drug and Alcohol Dependence
dc.rights	Publisher Policy
dc.source	PMC
dc.subject	Opioid use disorder
dc.subject	Machine learning
dc.subject	Algorithmic bias
dc.subject	Ancestry
dc.subject	Candidate genes
dc.title	Ancestry May Confound Genetic Machine Learning: Candidate-Gene Prediction of Opioid Use Disorder as an Example
dc.type	Article

Files

Original bundle

Now showing 1 - 1 of 1

Name:: nihms-1825240.pdf
Size:: 1.24 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.99 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Open Access Policy Articles
Department of Medical and Molecular Genetics Works