- Browse by Author
Browsing by Author "Neale, Benjamin"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item Ancestry May Confound Genetic Machine Learning: Candidate-Gene Prediction of Opioid Use Disorder as an Example(Elsevier, 2021) Hatoum, Alexander S.; Wendt, Frank R.; Galimberti, Marco; Polimanti, Renato; Neale, Benjamin; Kranzler, Henry R.; Gelernter, Joel; Edenberg, Howard J.; Agrawal, Arpana; Medical and Molecular Genetics, School of MedicineBackground: Machine learning (ML) models are beginning to proliferate in psychiatry, however machine learning models in psychiatric genetics have not always accounted for ancestry. Using an empirical example of a proposed genetic test for OUD, and exploring a similar test for tobacco dependence and a simulated binary phenotype, we show that genetic prediction using ML is vulnerable to ancestral confounding. Methods: We utilize five ML algorithms trained with 16 brain reward-derived "candidate" SNPs proposed for commercial use and examine their ability to predict OUD vs. ancestry in an out-of-sample test set (N = 1000, stratified into equal groups of n = 250 cases and controls each of European and African ancestry). We rerun analyses with 8 random sets of allele-frequency matched SNPs. We contrast findings with 11 genome-wide significant variants for tobacco smoking. To document generalizability, we generate and test a random phenotype. Results: None of the 5 ML algorithms predict OUD better than chance when ancestry was balanced but were confounded with ancestry in an out-of-sample test. In addition, the algorithms preferentially predicted admixed subpopulations. Random sets of variants matched to the candidate SNPs by allele frequency produced similar bias. Genome-wide significant tobacco smoking variants were also confounded by ancestry. Finally, random SNPs predicting a random simulated phenotype show that the bias attributable to ancestral confounding could impact any ML-based genetic prediction. Conclusions: Researchers and clinicians are encouraged to be skeptical of claims of high prediction accuracy from ML-derived genetic algorithms for polygenic traits like addiction, particularly when using candidate variants.Item FAVOR: functional annotation of variants online resource and annotator for variation across the human genome(Oxford University Press, 2023) Zhou, Hufeng; Arapoglou, Theodore; Li, Xihao; Li, Zilin; Zheng, Xiuwen; Moore, Jill; Asok, Abhijith; Kumar, Sushant; Blue, Elizabeth E.; Buyske, Steven; Cox, Nancy; Felsenfeld, Adam; Gerstein, Mark; Kenny, Eimear; Li, Bingshan; Matise, Tara; Philippakis, Anthony; Rehm, Heidi L.; Sofia, Heidi J.; Snyder, Grace; NHGRI Genome Sequencing Program Variant Functional Annotation Working Group; Weng, Zhiping; Neale, Benjamin; Sunyaev, Shamil R.; Lin, Xihong; Biostatistics, School of Public HealthLarge biobank-scale whole genome sequencing (WGS) studies are rapidly identifying a multitude of coding and non-coding variants. They provide an unprecedented resource for illuminating the genetic basis of human diseases. Variant functional annotations play a critical role in WGS analysis, result interpretation, and prioritization of disease- or trait-associated causal variants. Existing functional annotation databases have limited scope to perform online queries and functionally annotate the genotype data of large biobank-scale WGS studies. We develop the Functional Annotation of Variants Online Resources (FAVOR) to meet these pressing needs. FAVOR provides a comprehensive multi-faceted variant functional annotation online portal that summarizes and visualizes findings of all possible nine billion single nucleotide variants (SNVs) across the genome. It allows for rapid variant-, gene- and region-level queries of variant functional annotations. FAVOR integrates variant functional information from multiple sources to describe the functional characteristics of variants and facilitates prioritizing plausible causal variants influencing human phenotypes. Furthermore, we provide a scalable annotation tool, FAVORannotator, to functionally annotate large-scale WGS studies and efficiently store the genotype and their variant functional annotation data in a single file using the annotated Genomic Data Structure (aGDS) format, making downstream analysis more convenient. FAVOR and FAVORannotator are available at https://favor.genohub.org.Item Whole Genome Sequencing of Pedigrees With High Density of Substance Use and Psychiatric Disorders: A Meeting Report(Wiley, 2025) Hill, Shirley Y.; Edenberg, Howard J.; Corvin, Aiden; Thorgeirsson, Thorgeir; Below, Jennifer E.; Goldman, David; Leal, Suzanne; Almasy, Laura; Cox, Nancy J.; Daly, Mark; Neale, Benjamin; Vrieze, Scott; Zoghbi, Huda; Biochemistry and Molecular Biology, School of MedicineThe National Institute of Drug Abuse convened a panel of scientists with expertise in substance use disorders (SUD) and genetic methodologies primarily to determine the feasibility of performing whole genome sequencing utilizing existing pedigree collections with a high density of SUD and psychiatric disorders. A major focus was on determining if there had been any successes in identifying genetic variants for complex traits in family-based designs. Such information could provide assurance that whole genome sequencing might provide significant pay-offs particularly in the pursuit of rare variants and copy number variants. An important goal was to discuss and evaluate optimal strategies for studying genetic variants in human samples. Specific topics were (a) to consider whether a smaller number of cases typically available in family studies versus the larger number available in biobanks can reveal unique information; (b) to identify potential gaps in information available in biobank data that might be supplemented with family data; (c) to consider the optimal SUD phenotypic definitions (e.g., quantity of use, problem-oriented) and data collection instruments (self-report or clinician administered) that are both practical and efficient to collect, and likely to provide important insights concerning prevention, intervention, and medication development. Conclusions reached by the panel included optimism about the successes that have occurred in the existing family studies ascertained to include densely affected pedigrees. Evaluation of methodologies led, overall, to a panel consensus that steps should be taken to utilize biobank collection in conjunction with family-based investigations for optimal variant discovery.