- Browse by Author
Browsing by Author "Yang, Yuedong"
Now showing 1 - 10 of 11
Results Per Page
Sort Options
Item Charting the Unexplored RNA-binding Protein Atlas of the Human Genome(Office of the Vice Chancellor for Research, 2012-04-13) Zhao, Huiying; Yang, Yuedong; Janga, Sarath Chandra; Chen, Jason; Zhu, Heng; Kao, Cheng; Zhou, YaoqiDetecting protein-RNA interactions is challenging–both experimentally and computationally– because RNAs are large in number, diverse in cellular location and function, and flexible in structure. As a result, many RNA-binding proteins (RBPs) remain to be identified and characterized. Recently, we developed a bioinformatics tool called SPOT-Seq that integrates template-based structure prediction with RNA-binding affinity prediction to predict RBPs. Application of SPOT-Seq to human genome leads to doubling of RBPs from 2115 to 4296. Half of novel (>2000) RBPs are poorly or not annotated. The other half possesses Gene Ontology leaf IDs that are associated with known RBPs. In particular, we identified 36 novel RBPs in cancer, cardiovascular, diabetes and neurodegenerative pathways and 26 novel RBPs associated with disease-causing SNPs. Half of these disease-associating, predicted novel RBPs are annotated to interact with known RBPs. Accuracy of predicted novel RBPs is further validated by same confirmation rate of novel and annotated RBPs in human proteome microarrays experiments. The large number of predicted novel RBPs and their abundance in disease pathways and disease-causing SNPs are useful for hypothesis generation. These predicted novel human RBPs (>2000) with confidence level and their predicted complex structures with RNA can be downloaded from http://sparks.informatics.iupui.edu (yqzhou@iupui.edu)Item Direct prediction of profiles of sequences compatible to a protein structure by neural networks with fragment-based local and energy-based nonlocal profiles(Wiley Online Library, 2014-10) Li, Zhixiu; Yang, Yuedong; Faraggi, Eshel; Zhou, Jian; Zhou, Yaoqi; Department of BioHealth Informatics, IU School of Informatics and ComputingLocating sequences compatible with a protein structural fold is the well-known inverse protein-folding problem. While significant progress has been made, the success rate of protein design remains low. As a result, a library of designed sequences or profile of sequences is currently employed for guiding experimental screening or directed evolution. Sequence profiles can be computationally predicted by iterative mutations of a random sequence to produce energy-optimized sequences, or by combining sequences of structurally similar fragments in a template library. The latter approach is computationally more efficient but yields less accurate profiles than the former because of lacking tertiary structural information. Here we present a method called SPIN that predicts Sequence Profiles by Integrated Neural network based on fragment-derived sequence profiles and structure-derived energy profiles. SPIN improves over the fragment-derived profile by 6.7% (from 23.6 to 30.3%) in sequence identity between predicted and wild-type sequences. The method also reduces the number of residues in low complex regions by 15.7% and has a significantly better balance of hydrophilic and hydrophobic residues at protein surface. The accuracy of sequence profiles obtained is comparable to those generated from the protein design program RosettaDesign 3.5. This highly efficient method for predicting sequence profiles from structures will be useful as a single-body scoring term for improving scoring functions used in protein design and fold recognition. It also complements protein design programs in guiding experimental design of the sequence library for screening and directed evolution of designed sequences. The SPIN server is available at http://sparks-lab.org.Item Discriminating between disease-causing and neutral non-frameshifting micro-INDELs by support vector machines by means of integrated sequence- and structure-based features(Office of the Vice Chancellor for Research, 2013-04-05) Zhao, Huiying; Yang, Yuedong; Lin, Hai; Zhang, Xinjun; Mort, Matthew; Cooper, David N.; Liu, Yunlong; Zhou, YaoqiMicro-INDELs (insertions or deletions of ≤20 bp) constitute the second most frequent class of human gene mutation after single nucleotide variants. A significant portion of exonic INDELs are non-frameshifting (NFS), serving to insert or delete a discrete number of amino-acid residues. Despite the relative abundance of NFS-INDELs, their damaging effect on protein structure and function has gone largely unstudied whilst bioinformatics tools for discriminating between disease-causing and neutral NFS-INDELs remain to be developed. We have developed such a technique (DDIG-in; Detecting DIsease-causing Genetic variations due to INDELs) by comparing the properties of disease-causing NFS-INDELs from the Human Gene Mutation Database (HGMD) with putatively neutral NFS-INDELs from the 1,000 Genomes Project. Having considered 58 different sequence- and structure-based features, we found that predicted disordered regions around the NFS-INDEL region had the highest discriminative capability (disease versus neutral) with an Area Under the receiver-operating characteristic Curve (AUC) of 0.82 and a Matthews Correlation Coefficient (MCC) of 0.56. All features studied were combined by support vector machines (SVM) and selected by a greedy algorithm. The resulting SVM models were trained and tested by ten-fold cross-validation on the microdeletion dataset and independently tested on the microinsertion dataset and vice versa. The final SVM model for determining NFS-INDEL disease-causing probability was built on non-redundant datasets with a protein sequence identity cutoff of 35% and yielded an MCC value of 0.68, an accuracy of 84% and an AUC of 0.89. Predicted disease-causing probabilities exhibited a strong negative correlation with the average minor allele frequency (correlation coefficient, -0.84). DDIG-in, available at http://sparks.informatics.iupui.edu, can be used to estimate the disease-causing probability for a given NFS-INDEL.Item ExonImpact: Prioritizing Pathogenic Alternative Splicing Events(Wiley, 2017-01) Li, Meng; Feng, Weixing; Zhang, Xinjun; Yang, Yuedong; Wang, Kejun; Mort, Matthew; Cooper, David N.; Wang, Yue; Zhou, Yaoqi; Liu, Yunlong; Medicine, School of MedicineAlternative splicing (AS) is a closely regulated process that allows a single gene to encode multiple protein isoforms, thereby contributing to the diversity of the proteome. Dysregulation of the splicing process has been found to be associated with many inherited diseases. However, in amongst the pathogenic AS events there are numerous “passenger” events whose inclusion or exclusion does not lead to significant changes with respect to protein function. In this study, we evaluate the secondary and tertiary structural features of proteins associated with disease-causing and neutral AS events, and show that several structural features are strongly associated with the pathological impact of exon inclusion. We further develop a machine learning-based computational model, ExonImpact, for prioritizing and evaluating the functional consequences of hitherto uncharacterized AS events. We evaluated our model using several strategies including cross-validation, and data from the Gene-Tissue Expression (GTEx) and ClinVar databases. ExonImpact is freely available at http://watson.compbio.iupui.edu/ExonImpactItem Investigating DNA-, RNA-, and protein-based features as a means to discriminate pathogenic synonymous variants(Wiley, 2017) Livingstone, Mark; Folkman, Lukas; Yang, Yuedong; Zhang, Ping; Mort, Matthew; Cooper, David N.; Liu, Yunlong; Stantic, Bela; Zhou, Yaoqi; Department of Medical & Molecular Genetics, IU School of MedicineSynonymous single-nucleotide variants (SNVs), although they do not alter the encoded protein sequences, have been implicated in many genetic diseases. Experimental studies indicate that synonymous SNVs can lead to changes in the secondary and tertiary structures of DNA and RNA, thereby affecting translational efficiency, cotranslational protein folding as well as the binding of DNA-/RNA-binding proteins. However, the importance of these various features in disease phenotypes is not clearly understood. Here, we have built a support vector machine (SVM) model (termed DDIG-SN) as a means to discriminate disease-causing synonymous variants. The model was trained and evaluated on nearly 900 disease-causing variants. The method achieves robust performance with the area under the receiver operating characteristic curve of 0.84 and 0.85 for protein-stratified 10-fold cross-validation and independent testing, respectively. We were able to show that the disease-causing effects in the immediate proximity to exon–intron junctions (1–3 bp) are driven by the loss of splicing motif strength, whereas the gain of splicing motif strength is the primary cause in regions further away from the splice site (4–69 bp). The method is available as a part of the DDIG server at http://sparks-lab.org/ddig.Item Prediction and validation of the unexplored RNA-binding protein atlas of the human proteome(2014-04) Zhao, Huiying; Yang, Yuedong; Janga, Sarath Chandra; Kao, C. Cheng; Zhou, YaoqiDetecting protein-RNA interactions is challenging both experimentally and computationally because RNAs are large in number, diverse in cellular location and function, and flexible in structure. As a result, many RNA-binding proteins (RBPs) remain to be identified. Here, a template-based, function-prediction technique SPOT-Seq for RBPs is applied to human proteome and its result is validated by a recent proteomic experimental discovery of 860 mRNA-binding proteins (mRBPs). The coverage (or sensitivity) is 42.6% for 1217 known RBPs annotated in the Gene Ontology and 43.6% for 860 newly discovered human mRBPs. Consistent sensitivity indicates the robust performance of SPOT-Seq for predicting RBPs. More importantly, SPOT-Seq detects 2418 novel RBPs in human proteome, 291 of which were validated by the newly discovered mRBP set. Among 291 validated novel RBPs, 61 are not homologous to any known RBPs. Successful validation of predicted novel RBPs permits us to further analysis of their phenotypic roles in disease pathways. The dataset of 2418 predicted novel RBPs along with confidence levels and complex structures is available at http://sparks-lab.org (in publications) for experimental confirmations and hypothesis generation.Item Prediction and validation of the unexplored RNA-binding protein atlas of the human proteome(Wiley, 2014-04) Zhao, Huiying; Yang, Yuedong; Janga, Sarath Chandra; Kao, C. Cheng; Zhou, Yaoqi; Department of Medicine, IU School of MedicineDetecting protein-RNA interactions is challenging both experimentally and computationally because RNAs are large in number, diverse in cellular location and function, and flexible in structure. As a result, many RNA-binding proteins (RBPs) remain to be identified. Here, a template-based, function-prediction technique SPOT-Seq for RBPs is applied to human proteome and its result is validated by a recent proteomic experimental discovery of 860 mRNA-binding proteins (mRBPs). The coverage (or sensitivity) is 42.6% for 1217 known RBPs annotated in the Gene Ontology and 43.6% for 860 newly discovered human mRBPs. Consistent sensitivity indicates the robust performance of SPOT-Seq for predicting RBPs. More importantly, SPOT-Seq detects 2418 novel RBPs in human proteome, 291 of which were validated by the newly discovered mRBP set. Among 291 validated novel RBPs, 61 are not homologous to any known RBPs. Successful validation of predicted novel RBPs permits us to further analysis of their phenotypic roles in disease pathways. The dataset of 2418 predicted novel RBPs along with confidence levels and complex structures is available at http://sparks-lab.org (in publications) for experimental confirmations and hypothesis generation.Item regSNPs-splicing: a tool for prioritizing synonymous single-nucleotide substitution(Springer, 2017) Zhang, Xinjun; Li, Meng; Lin, Hai; Rao, Xi; Feng, Weixing; Yang, Yuedong; Mort, Matthew; Cooper, David N.; Wang, Yue; Wang, Yadong; Wells, Clark; Zhou, Yaoqi; Liu, Yunlong; Department of Medical & Molecular Genetics, IU School of MedicineWhile synonymous single-nucleotide variants (sSNVs) have largely been unstudied, since they do not alter protein sequence, mounting evidence suggests that they may affect RNA conformation, splicing, and the stability of nascent-mRNAs to promote various diseases. Accurately prioritizing deleterious sSNVs from a pool of neutral ones can significantly improve our ability of selecting functional genetic variants identified from various genome-sequencing projects, and, therefore, advance our understanding of disease etiology. In this study, we develop a computational algorithm to prioritize sSNVs based on their impact on mRNA splicing and protein function. In addition to genomic features that potentially affect splicing regulation, our proposed algorithm also includes dozens structural features that characterize the functions of alternatively spliced exons on protein function. Our systematical evaluation on thousands of sSNVs suggests that several structural features, including intrinsic disorder protein scores, solvent accessible surface areas, protein secondary structures, and known and predicted protein family domains, show significant differences between disease-causing and neutral sSNVs. Our result suggests that the protein structure features offer an added dimension of information while distinguishing disease-causing and neutral synonymous variants. The inclusion of structural features increases the predictive accuracy for functional sSNV prioritization.Item The Role of Semidisorder in Temperature Adaptation of Bacterial FlgM Proteins(Elsevier B.V., 2013-12-03) Wang, Jihua; Yang, Yuedong; Cao, Zanxia; Li, Zhixiu; Zhao, Huiying; Zhou, Yaoqi; Department of Biochemistry & Molecular Biology, IU School of MedicineProbabilities of disorder for FlgM proteins of 39 species whose optimal growth temperature ranges from 273 K (0°C) to 368 K (95°C) were predicted by a newly developed method called Sequence-based Prediction with Integrated NEural networks for Disorder (SPINE-D). We showed that the temperature-dependent behavior of FlgM proteins could be separated into two subgroups according to their sequence lengths. Only shorter sequences evolved to adapt to high temperatures (>318 K or 45°C). Their ability to adapt to high temperatures was achieved through a transition from a fully disordered state with little secondary structure to a semidisordered state with high predicted helical probability at the N-terminal region. The predicted results are consistent with available experimental data. An analysis of all orthologous protein families in 39 species suggests that such a transition from a fully disordered state to semidisordered and/or ordered states is one of the strategies employed by nature for adaptation to high temperatures.Item Self-derived structure-disrupting peptides targeting methionine aminopeptidase in pathogenic bacteria: a new strategy to generate antimicrobial peptides(Federation of American Society of Experimental Biology (FASEB), 2019-02) Zhan, Jian; Jia, Husen; Semchenko, Evgeny A.; Bian, Yunqiang; Zhou, Amy M.; Li, Zhixiu; Yang, Yuedong; Wang, Jihua; Sarkar, Sohinee; Totsika, Makrina; Blanchard, Helen; Jen, Freda E.-C.; Ye, Qizhuang; Haselhorst, Thomas; Jennings, Michael P.; Seib, Kate L.; Zhou, Yaoqi; Biochemistry and Molecular Biology, School of MedicineBacterial infection is one of the leading causes of death in young, elderly, and immune-compromised patients. The rapid spread of multi-drug-resistant (MDR) bacteria is a global health emergency and there is a lack of new drugs to control MDR pathogens. We describe a heretofore-unexplored discovery pathway for novel antibiotics that is based on self-targeting, structure-disrupting peptides. We show that a helical peptide, KFF- EcH3, derived from the Escherichia coli methionine aminopeptidase can disrupt secondary and tertiary structure of this essential enzyme, thereby killing the bacterium (including MDR strains). Significantly, no detectable resistance developed against this peptide. Based on a computational analysis, our study predicted that peptide KFF- EcH3 has the strongest interaction with the structural core of the methionine aminopeptidase. We further used our approach to identify peptide KFF- NgH1 to target the same enzyme from Neisseria gonorrhoeae. This peptide inhibited bacterial growth and was able to treat a gonococcal infection in a human cervical epithelial cell model. These findings present an exciting new paradigm in antibiotic discovery using self-derived peptides that can be developed to target the structures of any essential bacterial proteins.