- Browse by Author
Browsing by Author "Lin, Hai"
Now showing 1 - 10 of 19
Results Per Page
Sort Options
Item Allele-specific expression and high-throughput reporter assay reveal functional genetic variants associated with alcohol use disorders(Springer Nature, 2021-04) Rao, Xi; Thapa, Kriti S.; Chen, Andy B.; Lin, Hai; Gao, Hongyu; Reiter, Jill L.; Hargreaves, Katherine A.; Ipe, Joseph; Lai, Dongbing; Xuei, Xiaoling; Wang, Yue; Gu, Hongmei; Kapoor, Manav; Farris, Sean P.; Tischfield, Jay; Foroud, Tatiana; Goate, Alison M.; Skaar, Todd C.; Mayfield, R. Dayne; Edenberg, Howard J.; Liu, Yunlong; Medical and Molecular Genetics, School of MedicineGenome-wide association studies (GWAS) of complex traits, such as alcohol use disorders (AUD), usually identify variants in non-coding regions and cannot by themselves distinguish whether the associated variants are functional or in linkage disequilibrium with the functional variants. Transcriptome studies can identify genes whose expression differs between alcoholics and controls. To test which variants associated with AUD may cause expression differences, we integrated data from deep RNA-seq and GWAS of four postmortem brain regions from 30 subjects with AUD and 30 controls to analyze allele-specific expression (ASE). We identified 88 genes with differential ASE in subjects with AUD compared to controls. Next, to test one potential mechanism contributing to the differential ASE, we analyzed single nucleotide polymorphisms (SNPs) in the 3′ untranslated regions (3′UTR) of these genes. Of the 88 genes with differential ASE, 61 genes contained 437 SNPs in the 3′UTR with at least one heterozygote among the subjects studied. Using a modified PASSPORT-seq (parallel assessment of polymorphisms in miRNA target-sites by sequencing) assay, we identified 25 SNPs that affected RNA levels in a consistent manner in two neuroblastoma cell lines, SH-SY5Y and SK-N-BE(2). Many of these SNPs are in binding sites of miRNAs and RNA-binding proteins, indicating that these SNPs are likely causal variants of AUD-associated differential ASE. In sum, we demonstrate that a combination of computational and experimental approaches provides a powerful strategy to uncover functionally relevant variants associated with the risk for AUD.Item Allele-specific expression and high-throughput reporter assay reveal functional genetic variants associated with alcohol use disorders.(Springer, 2021-04) Rao, Xi; Thapa, Kriti S.; Chen, Andy B.; Lin, Hai; Gao, Hongyu; Reiter, Jill L.; Hargreaves, Katherine A.; Ipe, Joseph; Lai, Dongbing; Xuei, Xiaoling; Wang, Yue; Gu, Hongmei; Kapoor, Manav; Farris, Sean P.; Tischfield, Jay; Foroud, Tatiana; Goate, Alison M.; Skaar, Todd C.; Mayfield, R. Dayne; Edenberg, Howard J.; Liu, YunlongGenome-wide association studies (GWAS) of complex traits, such as alcohol use disorders (AUD), usually identify variants in non-coding regions and cannot by themselves distinguish whether the associated variants are functional or in linkage disequilibrium with the functional variants. Transcriptome studies can identify genes whose expression differs between alcoholics and controls. To test which variants associated with AUD may cause expression differences, we integrated data from deep RNA-seq and GWAS of four postmortem brain regions from 30 subjects with AUD and 30 controls to analyze allele-specific expression (ASE). We identified 88 genes with differential ASE in subjects with AUD compared to controls. Next, to test one potential mechanism contributing to the differential ASE, we analyzed single nucleotide polymorphisms (SNPs) in the 3' untranslated regions (3'UTR) of these genes. Of the 88 genes with differential ASE, 61 genes contained 437 SNPs in the 3'UTR with at least one heterozygote among the subjects studied. Using a modified PASSPORT-seq (parallel assessment of polymorphisms in miRNA target-sites by sequencing) assay, we identified 25 SNPs that affected RNA levels in a consistent manner in two neuroblastoma cell lines, SH-SY5Y and SK-N-BE(2). Many of these SNPs are in binding sites of miRNAs and RNA-binding proteins, indicating that these SNPs are likely causal variants of AUD-associated differential ASE. In sum, we demonstrate that a combination of computational and experimental approaches provides a powerful strategy to uncover functionally relevant variants associated with the risk for AUD.Item Altered mRNA Splicing in SMN-Depleted Motor Neuron-Like Cells(Public Library of Science (PLoS), 2016) Custer, Sara K.; Gilson, Timra D.; Li, Hongxia; Todd, A. Gary; Astroski, Jacob W.; Lin, Hai; Liu, Yunlong; Androphy, Elliot J.; Department of Dermatology, School of MedicineSpinal muscular atrophy (SMA) is an intractable neurodegenerative disease afflicting 1 in 6-10,000 live births. One of the key functions of the SMN protein is regulation of spliceosome assembly. Reduced levels of the SMN protein that are observed in SMA have been shown to result in aberrant mRNA splicing. SMN-dependent mis-spliced transcripts in motor neurons may cause stresses that are particularly harmful and may serve as potential targets for the treatment of motor neuron disease or as biomarkers in the SMA patient population. We performed deep RNA sequencing using motor neuron-like NSC-34 cells to screen for SMN-dependent mRNA processing changes that occur following acute depletion of SMN. We identified SMN-dependent splicing changes, including an intron retention event that results in the production of a truncated Rit1 transcript. This intron-retained transcript is stable and is mis-spliced in spinal cord from symptomatic SMA mice. Constitutively active Rit1 ameliorated the neurite outgrowth defect in SMN depleted NSC-34 cells, while expression of the truncated protein product of the mis-spliced Rit1 transcript inhibited neurite extension. These results reveal new insights into the biological consequence of SMN-dependent splicing in motor neuron-like cells.Item Computational modeling of splicing regulation(2017-04-20) Lin, Hai; Wu, Huanmei; Janga, Sarath Chandra; Liu, Xiaowen; Liu, YunlongAlternative splicing is one of the most important post-transcriptional modification in cell. It increases the coding capacity of the genome by enable one gene encoding multiple proteins. The majority of human protein-coding genes undergo alternative splicing. And mis-splicing of those genes are known to be associated with many human diseases. Therefore, it is important to study and understand the splicing regulatory machinery. The splicing regulation consists of two components: transacting regulators and cis-acting elements. In this dissertation, we explored these two aspects of splicing regulation. First, we investigate the relationship of three key trans-acting regulators: hnRNP A1, SRSF1 and U2AF with transcriptome-wide individual-nucleotide resolution cross-linking and immunoprecipitation (iCLIP) data. Our result revealed the competition relationship between hnRNP A1 and SRSF1 on 3’ splicing sites, and the inhabitation effects on U2AF recruitment after hnRNP A1 overexpression. We also discovered that Alu elements may serve as cis-acting elements and compete with authentic exons for the binding of U2AF. Second, we developed a machine learning algorithm to prioritize the disease-causing probability of intronic single-nucleotide variants (iSNVs) by evaluating their cisacting impact on both alternative splicing and protein structure. The resulting predictive model can predict pathogenic iSNVs with high accuracy and outperform popular algorithms such as splicing-based analysis of variants (SPANR) and combined annotation–dependent depletion (CADD). This suggests that protein structure features can provide additional layer of information in prioritizing pathogenic iSNVs. In conclusion, our studies provide remarkable insights on alternative splicing regarding both trans-acting regulation and cis-acting regulation. The discoveries of our research on trans-acting regulators are valuable for understanding splicing regulatory machinery. The algorithm we developed can be used to prioritize pathogenic iSNVs without needing to test them all in expensive and laborious assays.Item Discriminating between disease-causing and neutral non-frameshifting micro-INDELs by support vector machines by means of integrated sequence- and structure-based features(Office of the Vice Chancellor for Research, 2013-04-05) Zhao, Huiying; Yang, Yuedong; Lin, Hai; Zhang, Xinjun; Mort, Matthew; Cooper, David N.; Liu, Yunlong; Zhou, YaoqiMicro-INDELs (insertions or deletions of ≤20 bp) constitute the second most frequent class of human gene mutation after single nucleotide variants. A significant portion of exonic INDELs are non-frameshifting (NFS), serving to insert or delete a discrete number of amino-acid residues. Despite the relative abundance of NFS-INDELs, their damaging effect on protein structure and function has gone largely unstudied whilst bioinformatics tools for discriminating between disease-causing and neutral NFS-INDELs remain to be developed. We have developed such a technique (DDIG-in; Detecting DIsease-causing Genetic variations due to INDELs) by comparing the properties of disease-causing NFS-INDELs from the Human Gene Mutation Database (HGMD) with putatively neutral NFS-INDELs from the 1,000 Genomes Project. Having considered 58 different sequence- and structure-based features, we found that predicted disordered regions around the NFS-INDEL region had the highest discriminative capability (disease versus neutral) with an Area Under the receiver-operating characteristic Curve (AUC) of 0.82 and a Matthews Correlation Coefficient (MCC) of 0.56. All features studied were combined by support vector machines (SVM) and selected by a greedy algorithm. The resulting SVM models were trained and tested by ten-fold cross-validation on the microdeletion dataset and independently tested on the microinsertion dataset and vice versa. The final SVM model for determining NFS-INDEL disease-causing probability was built on non-redundant datasets with a protein sequence identity cutoff of 35% and yielded an MCC value of 0.68, an accuracy of 84% and an AUC of 0.89. Predicted disease-causing probabilities exhibited a strong negative correlation with the average minor allele frequency (correlation coefficient, -0.84). DDIG-in, available at http://sparks.informatics.iupui.edu, can be used to estimate the disease-causing probability for a given NFS-INDEL.Item Estrogen induces global reorganization of chromatin structure in human breast cancer cells(PLoS, 2014-12-03) Mourad, Raphael; Hsu, Pei-Yin; Juran, Liran; Shen, Changyu; Koneru, Prasad; Lin, Hai; Liu, Yunlong; Nephew, Kenneth; Huang, Tim H.; Li, Lang; Department of Medical and Molecular Genetics, IU School of MedicineIn the cell nucleus, each chromosome is confined to a chromosome territory. This spatial organization of chromosomes plays a crucial role in gene regulation and genome stability. An additional level of organization has been discovered at the chromosome scale: the spatial segregation into open and closed chromatins to form two genome-wide compartments. Although considerable progress has been made in our knowledge of chromatin organization, a fundamental issue remains the understanding of its dynamics, especially in cancer. To address this issue, we performed genome-wide mapping of chromatin interactions (Hi-C) over the time after estrogen stimulation of breast cancer cells. To biologically interpret these interactions, we integrated with estrogen receptor α (ERα) binding events, gene expression and epigenetic marks. We show that gene-rich chromosomes as well as areas of open and highly transcribed chromatins are rearranged to greater spatial proximity, thus enabling genes to share transcriptional machinery and regulatory elements. At a smaller scale, differentially interacting loci are enriched for cancer proliferation and estrogen-related genes. Moreover, these loci are correlated with higher ERα binding events and gene expression. Taken together these results reveal the role of a hormone--estrogen--on genome organization, and its effect on gene regulation in cancer.Item Evaluation of the Genetic Basis of Familial Aggregation of Pacemaker Implantation by a Large Next Generation Sequencing Panel(Public Library of Science (PloS), 2015) Celestino-Soper, Patrícia B. S.; Doytchinova, Anisiia; Steiner, Hillel A.; Uradu, Andrea; Lynnes, Ty C.; Groh, William J.; Miller, John M.; Lin, Hai; Gao, Hongyu; Wang, Zhiping; Liu, Yunlong; Chen, Peng-Sheng; Vatta, Matteo; Department of Medical and Molecular Genetics, IU School of MedicineBACKGROUND: The etiology of conduction disturbances necessitating permanent pacemaker (PPM) implantation is often unknown, although familial aggregation of PPM (faPPM) suggests a possible genetic basis. We developed a pan-cardiovascular next generation sequencing (NGS) panel to genetically characterize a selected cohort of faPPM. MATERIALS AND METHODS: We designed and validated a custom NGS panel targeting the coding and splicing regions of 246 genes with involvement in cardiac pathogenicity. We enrolled 112 PPM patients and selected nine (8%) with faPPM to be analyzed by NGS. RESULTS: Our NGS panel covers 95% of the intended target with an average of 229x read depth at a minimum of 15-fold depth, reaching a SNP true positive rate of 98%. The faPPM patients presented with isolated cardiac conduction disease (ICCD) or sick sinus syndrome (SSS) without overt structural heart disease or identifiable secondary etiology. Three patients (33.3%) had heterozygous deleterious variants previously reported in autosomal dominant cardiac diseases including CCD: LDB3 (p.D117N) and TRPM4 (p.G844D) variants in patient 4; TRPM4 (p.G844D) and ABCC9 (p.V734I) variants in patient 6; and SCN5A (p.T220I) and APOB (p.R3527Q) variants in patient 7. CONCLUSION: FaPPM occurred in 8% of our PPM clinic population. The employment of massive parallel sequencing for a large selected panel of cardiovascular genes identified a high percentage (33.3%) of the faPPM patients with deleterious variants previously reported in autosomal dominant cardiac diseases, suggesting that genetic variants may play a role in faPPM.Item HNRNPA1 promotes recognition of splice site decoys by U2AF2 in vivo(Cold Spring Harbor Laboratory Press, 2018-05) Howard, Jonathan M.; Lin, Hai; Wallace, Andrew J.; Kim, Garam; Draper, Jolene M.; Haeussler, Maximilian; Katzman, Sol; Toloue, Masoud; Liu, Yunlong; Sanford, Jeremy R.; Medical and Molecular Genetics, School of MedicineAlternative pre-mRNA splicing plays a major role in expanding the transcript output of human genes. This process is regulated, in part, by the interplay of trans-acting RNA binding proteins (RBPs) with myriad cis-regulatory elements scattered throughout pre-mRNAs. These molecular recognition events are critical for defining the protein-coding sequences (exons) within pre-mRNAs and directing spliceosome assembly on noncoding regions (introns). One of the earliest events in this process is recognition of the 3' splice site (3'ss) by U2 small nuclear RNA auxiliary factor 2 (U2AF2). Splicing regulators, such as the heterogeneous nuclear ribonucleoprotein A1 (HNRNPA1), influence spliceosome assembly both in vitro and in vivo, but their mechanisms of action remain poorly described on a global scale. HNRNPA1 also promotes proofreading of 3'ss sequences though a direct interaction with the U2AF heterodimer. To determine how HNRNPA1 regulates U2AF-RNA interactions in vivo, we analyzed U2AF2 RNA binding specificity using individual-nucleotide resolution crosslinking immunoprecipitation (iCLIP) in control and HNRNPA1 overexpression cells. We observed changes in the distribution of U2AF2 crosslinking sites relative to the 3'ss of alternative cassette exons but not constitutive exons upon HNRNPA1 overexpression. A subset of these events shows a concomitant increase of U2AF2 crosslinking at distal intronic regions, suggesting a shift of U2AF2 to "decoy" binding sites. Of the many noncanonical U2AF2 binding sites, Alu-derived RNA sequences represented one of the most abundant classes of HNRNPA1-dependent decoys. We propose that one way HNRNPA1 regulates exon definition is to modulate the interaction of U2AF2 with decoy or bona fide 3'ss.Item Impact of human pathogenic micro-insertions and micro-deletions on post-transcriptional regulation(Oxford University Press, 2014-06-01) Zhang, Xinjun; Lin, Hai; Zhao, Huiying; Hao, Yangyang; Mort, Matthew; Cooper, David N.; Zhou, Yaoqi; Liu, Yunlong; Department of Medical & Molecular Genetics, IU School of MedicineSmall insertions/deletions (INDELs) of ≤21 bp comprise 18% of all recorded mutations causing human inherited disease and are evident in 24% of documented Mendelian diseases. INDELs affect gene function in multiple ways: for example, by introducing premature stop codons that either lead to the production of truncated proteins or affect transcriptional efficiency. However, the means by which they impact post-transcriptional regulation, including alternative splicing, have not been fully evaluated. In this study, we collate disease-causing INDELs from the Human Gene Mutation Database (HGMD) and neutral INDELs from the 1000 Genomes Project. The potential of these two types of INDELs to affect binding-site affinity of RNA-binding proteins (RBPs) was then evaluated. We identified several sequence features that can distinguish disease-causing INDELs from neutral INDELs. Moreover, we built a machine-learning predictor called PinPor (predicting pathogenic small insertions and deletions affecting post-transcriptional regulation, http://watson.compbio.iupui.edu/pinpor/) to ascertain which newly observed INDELs are likely to be pathogenic. Our results show that disease-causing INDELs are more likely to ablate RBP-binding sites and tend to affect more RBP-binding sites than neutral INDELs. Additionally, disease-causing INDELs give rise to greater deviations in binding affinity than neutral INDELs. We also demonstrated that disease-causing INDELs may be distinguished from neutral INDELs by several sequence features, such as their proximity to splice sites and their potential effects on RNA secondary structure. This predictor showed satisfactory performance in identifying numerous pathogenic INDELs, with a Matthews correlation coefficient (MCC) value of 0.51 and an accuracy of 0.75.Item Lessons learned from whole exome sequencing in multiplex families affected by a complex genetic disorder, intracranial aneurysm(PLoS, 2015-03-24) Farlow, Janice L.; Lin, Hai; Sauerbeck, Laura; Lai, Dongbing; Koller, Daniel L.; Pugh, Elizabeth; Hetrick, Kurt; Ling, Hua; Kleinloog, Rachel; van der Vlies, Peter; Deelen, Patrick; Swertz, Morris A.; Verweij, Bon H.; Regli, Luca; Rinkel, Gabriel J.E.; Ruigrok, Ynte M.; Doheny, Kimberly; Liu, Yunlong; Broderick, Joseph; Foroud, Tatiana; Department of Medical and Molecular Genetics, IU School of MedicineGenetic risk factors for intracranial aneurysm (IA) are not yet fully understood. Genomewide association studies have been successful at identifying common variants; however, the role of rare variation in IA susceptibility has not been fully explored. In this study, we report the use of whole exome sequencing (WES) in seven densely-affected families (45 individuals) recruited as part of the Familial Intracranial Aneurysm study. WES variants were prioritized by functional prediction, frequency, predicted pathogenicity, and segregation within families. Using these criteria, 68 variants in 68 genes were prioritized across the seven families. Of the genes that were expressed in IA tissue, one gene (TMEM132B) was differentially expressed in aneurysmal samples (n=44) as compared to control samples (n=16) (false discovery rate adjusted p-value=0.023). We demonstrate that sequencing of densely affected families permits exploration of the role of rare variants in a relatively common disease such as IA, although there are important study design considerations for applying sequencing to complex disorders. In this study, we explore methods of WES variant prioritization, including the incorporation of unaffected individuals, multipoint linkage analysis, biological pathway information, and transcriptome profiling. Further studies are needed to validate and characterize the set of variants and genes identified in this study.