- Browse by Author
Browsing by Author "Zhou, Yaoqi"
Now showing 1 - 10 of 27
Results Per Page
Sort Options
Item Accurate single-sequence prediction of solvent accessible surface area using local and global features(Wiley Blackwell (John Wiley & Sons), 2014-11) Faraggi, Eshel; Zhou, Yaoqi; Kloczkowski, Andrzej; Department of Biochemistry & Molecular Biology, IU School of MedicineWe present a new approach for predicting the Accessible Surface Area (ASA) using a General Neural Network (GENN). The novelty of the new approach lies in not using residue mutation profiles generated by multiple sequence alignments as descriptive inputs. Instead we use solely sequential window information and global features such as single-residue and two-residue compositions of the chain. The resulting predictor is both highly more efficient than sequence alignment-based predictors and of comparable accuracy to them. Introduction of the global inputs significantly helps achieve this comparable accuracy. The predictor, termed ASAquick, is tested on predicting the ASA of globular proteins and found to perform similarly well for so-called easy and hard cases indicating generalizability and possible usability for de-novo protein structure prediction. The source code and a Linux executables for GENN and ASAquick are available from Research and Information Systems at http://mamiris.com, from the SPARKS Lab at http://sparks-lab.org, and from the Battelle Center for Mathematical Medicine at http://mathmed.org.Item Charting the Unexplored RNA-binding Protein Atlas of the Human Genome(Office of the Vice Chancellor for Research, 2012-04-13) Zhao, Huiying; Yang, Yuedong; Janga, Sarath Chandra; Chen, Jason; Zhu, Heng; Kao, Cheng; Zhou, YaoqiDetecting protein-RNA interactions is challenging–both experimentally and computationally– because RNAs are large in number, diverse in cellular location and function, and flexible in structure. As a result, many RNA-binding proteins (RBPs) remain to be identified and characterized. Recently, we developed a bioinformatics tool called SPOT-Seq that integrates template-based structure prediction with RNA-binding affinity prediction to predict RBPs. Application of SPOT-Seq to human genome leads to doubling of RBPs from 2115 to 4296. Half of novel (>2000) RBPs are poorly or not annotated. The other half possesses Gene Ontology leaf IDs that are associated with known RBPs. In particular, we identified 36 novel RBPs in cancer, cardiovascular, diabetes and neurodegenerative pathways and 26 novel RBPs associated with disease-causing SNPs. Half of these disease-associating, predicted novel RBPs are annotated to interact with known RBPs. Accuracy of predicted novel RBPs is further validated by same confirmation rate of novel and annotated RBPs in human proteome microarrays experiments. The large number of predicted novel RBPs and their abundance in disease pathways and disease-causing SNPs are useful for hypothesis generation. These predicted novel human RBPs (>2000) with confidence level and their predicted complex structures with RNA can be downloaded from http://sparks.informatics.iupui.edu (yqzhou@iupui.edu)Item Computational protein design: assessment and applications(2015) Li, Zhixiu; Zhou, YaoqiComputational protein design aims at designing amino acid sequences that can fold into a target structure and perform a desired function. Many computational design methods have been developed and their applications have been successful during past two decades. However, the success rate of protein design remains too low to be of a useful tool by biochemists whom are not an expert of computational biology. In this dissertation, we first developed novel computational assessment techniques to assess several state-of-the-art computational techniques. We found that significant progresses were made in several important measures by two new scoring functions from RosettaDesign and from OSCAR-design, respectively. We also developed the first machine-learning technique called SPIN that predicts a sequence profile compatible to a given structure with a novel nonlocal energy-based feature. The accuracy of predicted sequences is comparable to RosettaDesign in term of sequence identity to wild type sequences. In the last two application chapters, we have designed self-inhibitory peptides of Escherichia coli methionine aminopeptidase (EcMetAP) and de novo designed barstar. Several peptides were confirmed inhibition of EcMetAP at the micromole-range 50% inhibitory concentration. Meanwhile, the assessment of designed barstar sequences indicates the improvement of OSCAR-design over RosettaDesign.Item DescribePROT: database of amino acid-level protein structure and function predictions(Oxford University Press, 2021-01-08) Zhao, Bi; Katuwawala, Akila; Oldfield, Christopher J.; Dunker, A. Keith; Faraggi, Eshel; Gsponer, Jörg; Kloczkowski, Andrzej; Malhis, Nawar; Mirdita, Milot; Obradovic, Zoran; Söding, Johannes; Steinegger, Martin; Zhou, Yaoqi; Kurgan, Lukasz; Medicine, School of MedicineWe present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.Item Direct prediction of profiles of sequences compatible to a protein structure by neural networks with fragment-based local and energy-based nonlocal profiles(Wiley Online Library, 2014-10) Li, Zhixiu; Yang, Yuedong; Faraggi, Eshel; Zhou, Jian; Zhou, Yaoqi; Department of BioHealth Informatics, IU School of Informatics and ComputingLocating sequences compatible with a protein structural fold is the well-known inverse protein-folding problem. While significant progress has been made, the success rate of protein design remains low. As a result, a library of designed sequences or profile of sequences is currently employed for guiding experimental screening or directed evolution. Sequence profiles can be computationally predicted by iterative mutations of a random sequence to produce energy-optimized sequences, or by combining sequences of structurally similar fragments in a template library. The latter approach is computationally more efficient but yields less accurate profiles than the former because of lacking tertiary structural information. Here we present a method called SPIN that predicts Sequence Profiles by Integrated Neural network based on fragment-derived sequence profiles and structure-derived energy profiles. SPIN improves over the fragment-derived profile by 6.7% (from 23.6 to 30.3%) in sequence identity between predicted and wild-type sequences. The method also reduces the number of residues in low complex regions by 15.7% and has a significantly better balance of hydrophilic and hydrophobic residues at protein surface. The accuracy of sequence profiles obtained is comparable to those generated from the protein design program RosettaDesign 3.5. This highly efficient method for predicting sequence profiles from structures will be useful as a single-body scoring term for improving scoring functions used in protein design and fold recognition. It also complements protein design programs in guiding experimental design of the sequence library for screening and directed evolution of designed sequences. The SPIN server is available at http://sparks-lab.org.Item Discriminating between disease-causing and neutral non-frameshifting micro-INDELs by support vector machines by means of integrated sequence- and structure-based features(Office of the Vice Chancellor for Research, 2013-04-05) Zhao, Huiying; Yang, Yuedong; Lin, Hai; Zhang, Xinjun; Mort, Matthew; Cooper, David N.; Liu, Yunlong; Zhou, YaoqiMicro-INDELs (insertions or deletions of ≤20 bp) constitute the second most frequent class of human gene mutation after single nucleotide variants. A significant portion of exonic INDELs are non-frameshifting (NFS), serving to insert or delete a discrete number of amino-acid residues. Despite the relative abundance of NFS-INDELs, their damaging effect on protein structure and function has gone largely unstudied whilst bioinformatics tools for discriminating between disease-causing and neutral NFS-INDELs remain to be developed. We have developed such a technique (DDIG-in; Detecting DIsease-causing Genetic variations due to INDELs) by comparing the properties of disease-causing NFS-INDELs from the Human Gene Mutation Database (HGMD) with putatively neutral NFS-INDELs from the 1,000 Genomes Project. Having considered 58 different sequence- and structure-based features, we found that predicted disordered regions around the NFS-INDEL region had the highest discriminative capability (disease versus neutral) with an Area Under the receiver-operating characteristic Curve (AUC) of 0.82 and a Matthews Correlation Coefficient (MCC) of 0.56. All features studied were combined by support vector machines (SVM) and selected by a greedy algorithm. The resulting SVM models were trained and tested by ten-fold cross-validation on the microdeletion dataset and independently tested on the microinsertion dataset and vice versa. The final SVM model for determining NFS-INDEL disease-causing probability was built on non-redundant datasets with a protein sequence identity cutoff of 35% and yielded an MCC value of 0.68, an accuracy of 84% and an AUC of 0.89. Predicted disease-causing probabilities exhibited a strong negative correlation with the average minor allele frequency (correlation coefficient, -0.84). DDIG-in, available at http://sparks.informatics.iupui.edu, can be used to estimate the disease-causing probability for a given NFS-INDEL.Item ExonImpact: Prioritizing Pathogenic Alternative Splicing Events(Wiley, 2017-01) Li, Meng; Feng, Weixing; Zhang, Xinjun; Yang, Yuedong; Wang, Kejun; Mort, Matthew; Cooper, David N.; Wang, Yue; Zhou, Yaoqi; Liu, Yunlong; Medicine, School of MedicineAlternative splicing (AS) is a closely regulated process that allows a single gene to encode multiple protein isoforms, thereby contributing to the diversity of the proteome. Dysregulation of the splicing process has been found to be associated with many inherited diseases. However, in amongst the pathogenic AS events there are numerous “passenger” events whose inclusion or exclusion does not lead to significant changes with respect to protein function. In this study, we evaluate the secondary and tertiary structural features of proteins associated with disease-causing and neutral AS events, and show that several structural features are strongly associated with the pathological impact of exon inclusion. We further develop a machine learning-based computational model, ExonImpact, for prioritizing and evaluating the functional consequences of hitherto uncharacterized AS events. We evaluated our model using several strategies including cross-validation, and data from the Gene-Tissue Expression (GTEx) and ClinVar databases. ExonImpact is freely available at http://watson.compbio.iupui.edu/ExonImpactItem From sequence to structure, to function, and back again: Integrating knowledge-based approaches with physical intuitions for protein folding, binding, and design(Office of the Vice Chancellor for Research, 2010-04-09) Zhou, YaoqiBy combining physical and knowledge-based approaches, state-of-the-art bioinformatics tools are developed for protein structure prediction, function prediction (DNA binding) and structurebased protein and ligand design.Item From sequence to structure, to function, and back again: Integrating knowledge-based approaches with physical intuitions for protein folding, binding, and design(Office of the Vice Chancellor for Research, 2011-04-08) Zhou, YaoqiMost biological activities are directed and/or regulated by proteins made of a gene-specified sequence of 20 amino-acid residue types. As a result, function or malfunction of specific proteins is responsible for almost all diseases. Proteins perform their function through their unique, self-assembled (folded) three-dimensional structures and through their specific binding to small molecules, to DNA/RNA (e.g. transcription factors that regulate gene expressions), or to other proteins (e.g. molecular recognition in signal transduction). Thus, how to predict the structure of a protein from its amino-acid sequence, discover the function from its structure and, then, design the sequence from its function or structure are the most essential problems in structural biology. In this poster, we will illustrate how the coupling of physical intuitions with learning from structural databases can go a long way toward untangling the complex relation between sequence, structure and function of proteins.Item FUNCTIONAL GENOMICS STUDY TO UNDERSTAND THE ROLE OF SEROTONIN IN MOUSE EMBRYONIC STEM CELLS(2011-10-19) Nagari, Anusha; Perumal, Narayanan B.; Panicker, Mitradas M.; Zhou, Yaoqi; Pradhan, MeetaSerotonin (5-hydroxytryptamine, 5-HT) is a monoamine neurotransmitter that is synthesized from the amino acid L-tryptophan and is reported to localize in mitochondria of embryonic stem cells. Even before its role as a neurotransmitter in mature brain was discovered, 5-HT has been shown to play an important role in regulating brain development. However, there is a lack of knowledge about the downstream target genes regulated by serotonin in embryonic stem (ES) cells. Towards this end, our study helps in understanding transcriptional regulatory mechanisms of 5-HT responsive genes in ES cells. By combining the gene expression data with motif prediction algorithms, literature validation and comparison with public domain data, gene targets specific to endogenous or exogenous 5-HT in ES cells were identified. By performing one-way ANOVA, and volcano plots using GeneSpring software, we identified 44 5-HT induced and 29 5-HT suppressed genes, likely to be transcriptionally regulated by 4 & 2 TFs respectively. Motif enrichment analysis on these target genes using MotifScanner revealed that the transcription factor TFAP2A plays a key role in regulating the expression of 5-HT responsive genes. Furthermore, by comparing our dataset with published expression profiles of ES cells, we observed a number of 5-HT responsive target genes showing enrichment in ES cells. Genes such as Nanog, Slc38a5, Hoxb1 and Eif2s1 from this analysis have been observed to be components of ‘stemness’ phenotypes reported in literature. Functional annotation of the 5-HT responsive genes identified gene ontologies such as regulation of translation in response to stress and energy derivation by oxidation, suggesting a regulatory role for 5-HT in mitochondrial functions of ES cells. Additionally, enrichment of other biological process terms such as development of various parts of nervous system, cell adhesion, and apoptosis suggests that 5-HT target genes may play an important role in ES cell differentiation. Our study implemented a new combinatorial approach for identifying gene regulatory mechanisms involved in 5-HT responsive genes and proposed potential mediatory role for serotonin in ES cell differentiation and growth. Thus, this study provides potential 5-HT target genes in ES cells for biological validation.
- «
- 1 (current)
- 2
- 3
- »