Old vs. New Local Ancestry Inference in HCHS/SOL: A Comparative Study
Date
Language
Embargo Lift Date
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Abstract
Hispanic/Latino populations are admixed, with genetic contributions from multiple ancestral populations. Studies of genetic association in these admixed populations often use methods such as admixture mapping, which relies on inferred counts of "local" ancestry, i.e., of the source ancestral population at a locus. Local ancestries are inferred using external reference panels that represent ancestral populations, making the choice of inference method and reference panel critical. This study used a dataset of Hispanic/Latino individuals from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) to evaluate the "old" local ancestry inference performed using the state-of-the-art inference method, RFMix, alongside "new" inferences performed using Fast Local Ancestry Estimation (FLARE), which also used an updated reference panel. We compared their performance in terms of global and local ancestry correlations, as well as admixture mapping-based associations. Overall, the old RFMix and new FLARE inferences were highly similar for both global and local ancestries, with FLARE-inferred datasets yielding admixture mapping results consistent with those computed from RFMix. However, in some genomic regions the old and new local ancestries have relatively lower correlations (Pearson R < 0.9). Most of these genomic regions (86.42%) were mapped to either ENCODE blacklist regions, or to gene clusters, compared to 7.67% of randomly-matched regions with high correlations (Pearson R > 0.97) between old and new local ancestries.