- Browse by Subject
Browsing by Subject "Population genetics"
Now showing 1 - 6 of 6
Results Per Page
Sort Options
Item Association of Host and Microbial Species Diversity across Spatial Scales in Desert Rodent Communities(PLOS (Public Library of Science), 2014-10-24) Gavish, Yoni; Kedem, Hadar; Messika, Irit; Cohen, Carmit; Toh, Evelyn; Munro, Daniel; Dong, Qunfeng; Fuqua, Clay; Clay, Keith; Hawlena, Hadas; Department of Microbiology & Immunology, School of MedicineRelationships between host and microbial diversity have important ecological and applied implications. Theory predicts that these relationships will depend on the spatio-temporal scale of the analysis and the niche breadth of the organisms in question, but representative data on host-microbial community assemblage in nature is lacking. We employed a natural gradient of rodent species richness and quantified bacterial communities in rodent blood at several hierarchical spatial scales to test the hypothesis that associations between host and microbial species diversity will be positive in communities dominated by organisms with broad niches sampled at large scales. Following pyrosequencing of rodent blood samples, bacterial communities were found to be comprised primarily of broad niche lineages. These communities exhibited positive correlations between host diversity, microbial diversity and the likelihood for rare pathogens at the regional scale but not at finer scales. These findings demonstrate how microbial diversity is affected by host diversity at different spatial scales and suggest that the relationships between host diversity and overall disease risk are not always negative, as the dilution hypothesis predicts.Item Computational modeling for identification of low-frequency single nucleotide variants(2015-11-16) Hao, Yangyang; Liu, Yunlong; Edenberg, Howard J.; Li, Lang; Nakshatr, HarikrishnaReliable detection of low-frequency single nucleotide variants (SNVs) carries great significance in many applications. In cancer genetics, the frequencies of somatic variants from tumor biopsies tend to be low due to contamination with normal tissue and tumor heterogeneity. Circulating tumor DNA monitoring also faces the challenge of detecting low-frequency variants due to the small percentage of tumor DNA in blood. Moreover, in population genetics, although pooled sequencing is cost-effective compared with individual sequencing, pooling dilutes the signals of variants from any individual. Detection of low frequency variants is difficult and can be cofounded by multiple sources of errors, especially next-generation sequencing artifacts. Existing methods are limited in sensitivity and mainly focus on frequencies around 5%; most fail to consider differential, context-specific sequencing artifacts. To face this challenge, we developed a computational and experimental framework, RareVar, to reliably identify low-frequency SNVs from high-throughput sequencing data. For optimized performance, RareVar utilized a supervised learning framework to model artifacts originated from different components of a specific sequencing pipeline. This is enabled by a customized, comprehensive benchmark data enriched with known low-frequency SNVs from the sequencing pipeline of interest. Genomic-context-specific sequencing error model was trained on the benchmark data to characterize the systematic sequencing artifacts, to derive the position-specific detection limit for sensitive low-frequency SNV detection. Further, a machine-learning algorithm utilized sequencing quality features to refine SNV candidates for higher specificity. RareVar outperformed existing approaches, especially at 0.5% to 5% frequency. We further explored the influence of statistical modeling on position specific error modeling and showed zero-inflated negative binomial as the best-performed statistical distribution. When replicating analyses on an Illumina MiSeq benchmark dataset, our method seamlessly adapted to technologies with different biochemistries. RareVar enables sensitive detection of low-frequency SNVs across different sequencing platforms and will facilitate research and clinical applications such as pooled sequencing, cancer early detection, prognostic assessment, metastatic monitoring, and relapses or acquired resistance identification.Item Exploring regional aspects of 3D facial variation within European individuals(Springer Nature, 2023-03-06) Wilke, Franziska; Herrick, Noah; Matthews, Harold; Hoskens, Hanne; Singh, Sylvia; Shaffer, John R.; Weinberg, Seth M.; Shriver, Mark D.; Claes, Peter; Walsh, Susan; Biology, School of ScienceFacial ancestry can be described as variation that exists in facial features that are shared amongst members of a population due to environmental and genetic effects. Even within Europe, faces vary among subregions and may lead to confounding in genetic association studies if unaccounted for. Genetic studies use genetic principal components (PCs) to describe facial ancestry to circumvent this issue. Yet the phenotypic effect of these genetic PCs on the face has yet to be described, and phenotype-based alternatives compared. In anthropological studies, consensus faces are utilized as they depict a phenotypic, not genetic, ancestry effect. In this study, we explored the effects of regional differences on facial ancestry in 744 Europeans using genetic and anthropological approaches. Both showed similar ancestry effects between subgroups, localized mainly to the forehead, nose, and chin. Consensus faces explained the variation seen in only the first three genetic PCs, differing more in magnitude than shape change. Here we show only minor differences between the two methods and discuss a combined approach as a possible alternative for facial scan correction that is less cohort dependent, more replicable, non-linear, and can be made open access for use across research groups, enhancing future studies in this field.Item Forensic applications of associating human scalp hair morphology and pigmentation analysis at the microscopic and molecular level(2017-08) Stubbs, Wesli Kay; Walsh, Susan; Picard, Christine; Berbari, NicholasCriminal investigation and the science behind evidence analysis is an ever- growing niche, and forensic DNA phenotyping (FDP) is no exception. For years the only information given to authorities regarding DNA found at a crime scene was STR analysis and matching to a comparative DNA sample from a known source. However, what happens when there is no suspect to compare DNA profiles, or the case involves a missing person where the only available piece of evidence is a biological sample found at the scene? Before FDP, not much could be done with the DNA sample and the investigation would be stalled. Now it is becoming possible to statistically predict an individual’s visual characteristics using FDP. Currently, with the use of Irisplex, HIrisplex, and HIrisplex-S, statistical analyses and predictions can be done for categorical eye, hair, and skin color by looking at specific genes and their associative SNPs, such as HERC2 and OCA2. The more that is understood about trait-determining genes and their functional significance with regards to our physical traits, the more phenotypes can be added to these prediction tools. In an effort to discover additional genes associated with human phenotypes, this study looked at thirty-two pigmentation-associated candidate genes, and eleven hair structure and morphology associated genes in owl monkey pelage samples. Although the samples were not of human origin, it is important to point out the high conservation between humans and their non-human primate relatives. The owl monkeys used in this study were helpful for tracking expression levels of genes controlling differentpigmentation and hair structure types, because each monkey had intra-individual variation in thickness and in coat color which allowed the generation of potential candidate genes for human investigation. Of the 43 total candidate genes analyzed, 36 had successful amplification, and 28 showed a significant difference in expression when comparing the different samples. The second part of this study was to compare quantitative characteristics of human hair in physical samples and two-dimensional (2D) photos. A test set of 45 individuals had 3-5 hairs from the vertex of their head plucked and analyzed, and a 2D photograph was taken of their scalp hair. The idea was to be able to make quantitative phenotypes in hair (such as hair width, amount of curl) from 2D imagery, when physical samples are not available for analysis. This is due to the fact that the majority of genotype-phenotype databases consist solely of photographic imagery, and seldom have hairs that can be microscopically prepared for analysis. Defining measurable phenotypes from 2D photos that strongly correlate with their physical counterparts allow for the generation of a more accurate phenotype for future genome wide association studies (GWAS) within and outside this laboratory that study hair thickness and hair curl. Three different quantitative phenotypes were compared between the microscopic and 2D photo- analysis.Item Identifying polymorphic cis-regulatory variants as risk markers for lung carcinogenesis and chemotherapy responses in tobacco smokers from eastern India(Springer Nature, 2023-03-10) Sengupta, Debmalya; Mukhopadhyay, Pramiti; Banerjee, Souradeep; Ganguly, Kausik; Mascharak, Prateek; Mukherjee, Noyonika; Mitra, Sangeeta; Bhattacharjee, Samsiddhi; Mitra, Ritabrata; Sarkar, Abhijit; Chaudhuri, Tamohan; Bhattacharjee, Gautam; Nath, Somsubhra; Roychoudhury, Susanta; Sengupta, Mainak; Biochemistry and Molecular Biology, School of MedicineAberrant expression of xenobiotic metabolism and DNA repair genes is critical to lung cancer pathogenesis. This study aims to identify the cis-regulatory variants of the genes modulating lung cancer risk among tobacco smokers and altering their chemotherapy responses. From a list of 2984 SNVs, prioritization and functional annotation revealed 22 cis-eQTLs of 14 genes within the gene expression-correlated DNase I hypersensitive sites using lung tissue-specific ENCODE, GTEx, Roadmap Epigenomics, and TCGA datasets. The 22 cis-regulatory variants predictably alter the binding of 44 transcription factors (TFs) expressed in lung tissue. Interestingly, 6 reported lung cancer-associated variants were found in linkage disequilibrium (LD) with 5 prioritized cis-eQTLs from our study. A case–control study with 3 promoter cis-eQTLs (p < 0.01) on 101 lung cancer patients and 401 healthy controls from eastern India with confirmed smoking history revealed an association of rs3764821 (ALDH3B1) (OR = 2.53, 95% CI = 1.57–4.07, p = 0.00014) and rs3748523 (RAD52) (OR = 1.69, 95% CI = 1.17–2.47, p = 0.006) with lung cancer risk. The effect of different chemotherapy regimens on the overall survival of lung cancer patients to the associated variants showed that the risk alleles of both variants significantly decreased (p < 0.05) patient survival.Item Unsupervised representation learning on high-dimensional clinical data improves genomic discovery and prediction(Springer Nature, 2024) Yun, Taedong; Cosentino, Justin; Behsaz, Babak; McCaw, Zachary R.; Hill, Davin; Luben, Robert; Lai, Dongbing; Bates, John; Yang, Howard; Schwantes-An, Tae-Hwi; Zhou, Yuchen; Khawaja, Anthony P.; Carroll, Andrew; Hobbs, Brian D.; Cho, Michael H.; McLean, Cory Y.; Hormozdiari, Farhad; Medical and Molecular Genetics, School of MedicineAlthough high-dimensional clinical data (HDCD) are increasingly available in biobank-scale datasets, their use for genetic discovery remains challenging. Here we introduce an unsupervised deep learning model, Representation Learning for Genetic Discovery on Low-Dimensional Embeddings (REGLE), for discovering associations between genetic variants and HDCD. REGLE leverages variational autoencoders to compute nonlinear disentangled embeddings of HDCD, which become the inputs to genome-wide association studies (GWAS). REGLE can uncover features not captured by existing expert-defined features and enables the creation of accurate disease-specific polygenic risk scores (PRSs) in datasets with very few labeled data. We apply REGLE to perform GWAS on respiratory and circulatory HDCD-spirograms measuring lung function and photoplethysmograms measuring blood volume changes. REGLE replicates known loci while identifying others not previously detected. REGLE are predictive of overall survival, and PRSs constructed from REGLE loci improve disease prediction across multiple biobanks. Overall, REGLE contain clinically relevant information beyond that captured by existing expert-defined features, leading to improved genetic discovery and disease prediction.