IU Indianapolis ScholarWorks :: Browsing by Author "Romero, Pedro"

Browsing by Author "Romero, Pedro"

Now showing 1 - 9 of 9

D2P2: database of disordered protein predictions
(Oxford University Press, 2013) Oates, Matt E.; Romero, Pedro; Ishida, Takashi; Ghalwash, Mohamed; Mizianty, Marcin J.; Xue, Bin; Dosztányi, Zsuzsanna; Uversky, Vladimir N.; Obradovic, Zoran; Kurgan, Lukasz; Dunker, A. Keith; Gough, Julian; Center for Computational Biology and Bioinformatics, School of Medicine
We present the Database of Disordered Protein Prediction (D(2)P(2)), available at http://d2p2.pro (including website source code). A battery of disorder predictors and their variants, VL-XT, VSL2b, PrDOS, PV2, Espritz and IUPred, were run on all protein sequences from 1765 complete proteomes (to be updated as more genomes are completed). Integrated with these results are all of the predicted (mostly structured) SCOP domains using the SUPERFAMILY predictor. These disorder/structure annotations together enable comparison of the disorder predictors with each other and examination of the overlap between disordered predictions and SCOP domains on a large scale. D(2)P(2) will increase our understanding of the interplay between disorder and structure, the genomic distribution of disorder, and its evolutionary history. The parsed data are made available in a unified format for download as flat files or SQL tables either by genome, by predictor, or for the complete set. An interactive website provides a graphical view of each protein annotated with the SCOP domains and disordered regions from all predictors overlaid (or shown as a consensus). There are statistics and tools for browsing and comparing genomes and their disorder within the context of their position on the tree of life.
Improving protein order-disorder classification using charge-hydropathy plots
(Springer (Biomed Central Ltd.), 2014) Huang, Fei; Oldfield, Christopher J.; Xue, Bin; Hsu, Wei-Lun; Meng, Jingwei; Liu, Xiaowen; Shen, Li; Romero, Pedro; Uversky, Vladimir N.; Dunker, A. Keith; Department of Biochemistry and Molecular Biology, IU School of Medicine
BACKGROUND: The earliest whole protein order/disorder predictor (Uversky et al., Proteins, 41: 415-427 (2000)), herein called the charge-hydropathy (C-H) plot, was originally developed using the Kyte-Doolittle (1982) hydropathy scale (Kyte & Doolittle., J. Mol. Biol, 157: 105-132(1982)). Here the goal is to determine whether the performance of the C-H plot in separating structured and disordered proteins can be improved by using an alternative hydropathy scale. RESULTS: Using the performance of the CH-plot as the metric, we compared 19 alternative hydropathy scales, with the finding that the Guy (1985) hydropathy scale (Guy, Biophys. J, 47:61-70(1985)) was the best of the tested hydropathy scales for separating large collections structured proteins and intrinsically disordered proteins (IDPs) on the C-H plot. Next, we developed a new scale, named IDP-Hydropathy, which further improves the discrimination between structured proteins and IDPs. Applying the C-H plot to a dataset containing 109 IDPs and 563 non-homologous fully structured proteins, the Kyte-Doolittle (1982) hydropathy scale, the Guy (1985) hydropathy scale, and the IDP-Hydropathy scale gave balanced two-state classification accuracies of 79%, 84%, and 90%, respectively, indicating a very substantial overall improvement is obtained by using different hydropathy scales. A correlation study shows that IDP-Hydropathy is strongly correlated with other hydropathy scales, thus suggesting that IDP-Hydropathy probably has only minor contributions from amino acid properties other than hydropathy. CONCLUSION: We suggest that IDP-Hydropathy would likely be the best scale to use for any type of algorithm developed to predict protein disorder.
Informatics Approaches to Linking Mutations to Biological Pathways, Networks and Clinical Data
(2011-07-08) Singh, Arti; Mooney, Sean; Jung, Jeesun; Romero, Pedro
The information gained from sequencing of the human genome has begun to transform human biology and genetic medicine. The discovery of functionally important genetic variation lies at the heart of these endeavors, and there has been substantial progress in understanding the common patterns of single-nucleotide polymorphism (SNP) in humans- the most frequent type of variation in humans. Although more than 99% of human DNA sequences are the same across the population, variations in DNA sequence have a major impact on how we humans respond to disease; to environmental entities such as bacteria, viruses, toxins, and chemicals; and drugs and other therapies and thus studying differences between our genomes is vital. This makes SNPs as well other genetic variation data of great value for biomedical research and for developing pharmaceutical products or medical diagnostics. The goal of the project is to link genetic variation data to biological pathways and networks data, and also to clinical data for creating a framework for translational and systems biology studies. The study of the interactions between the components of biological systems and biological pathways has become increasingly important. It is known and accepted by scientists that it as important to study different biological entities as interacting systems, as in isolation. This project has ideas rooted in this thinking aiming at the integration of a genetic variation dataset with biological pathways dataset. Annotating genetic variation data with standardized disease notation is a very difficult yet important endeavor. One of the goals of this research is to identify whether informatics approaches can be applied to automatically annotate genetic variation data with a classification of diseases.
Intrinsic Disorder and Protein Evolution: Amino Acid Composition of Proteins in Last Universal Ancestor
Karne, Sai Harish Babu; Romero, Pedro
All twenty amino acids did not appear simultaneously in nature. Instead some of them appeared early, while others were added into the genetic code later. The amino acids that were formed by Miller (1953) are suggested to have appeared early in evolutionary history, and the amino acids associated with codon capture developed late in the course of evolution. The chronological order of appearance of the amino acids proposed by Trifonov (2000) was G/A, V/D, P, S, E/L, T, R, N, K, Q, I, C, H, F, M, Y, W. According to Romero et al. (1997) amino acids G, D, E, P and S are disorder-promoting residues and C, F, W and Y are order-promoting residues this means that the early or the ancient amino acids were disorder promoting and the order promoting residues came late into the genetic code. These observations led to the hypothesis that the first proteins, which were comprised of the early amino acids only, were disordered, and, furthermore, that the appearance of the late amino acids and the appearance of the structural proteins were concurrent. Software developed by Brooks et al. (2004) to find the amino acid composition of the LUA (Last Universal Ancestor) was used to test this hypothesis. For this work, the Clusters of Orhtologous Groups of proteins (65 COGs) were split into enzymes and non-enzymes. It was found that intrinsic disorder was abundant in both the groups of proteins, with non enzymes being much more disorder than enzymes. Further analysis was done to check for the frequency of the modern amino acids C, F, W, and Y in the Protein data bank (PDB) and Swissprot.
Intrinsic disorder in protein products of newborn genes
(2011-10-19) K., S.; Romero, Pedro; Perumal, Narayanan B.; Dunker, Keith
There are many mechanisms for the creation of new genes. In this study, the newborn genes i.e. de novo genes are the genes that are created from scratch. These are created by two mechanisms, polymerization (de novo genes produced from non-coding regions) and overprinting (de novo genes produced from overlapping frames). Rancurel et al has found that de novo genes in overlapping coding regions tend to be more disordered than their ancestral counterparts. It was suggested that it is natural for the newborn genes to be disordered, as it must be very difficult for newborn genes to obtain order at such an early stage, so that the structure is only developed after the evolutionary development. The two hypotheses tested in this study state (1) that genes generated de novo will have a tendency to be disordered, and (2) this tendency is due to a natural inclination of these genes to be disordered at birth. The origin and evolution of some de novo coding regions have been studied in detail. We analyzed genes reported in literature that have been produced de novo; either by overprinting or by polymerization, and their tendency for disorder was evaluated using the VSL2 disorder predictor. The de novo coding regions produced by both ways indeed shows a tendency towards disorder, which supports hypothesis 1. For hypothesis 2 to be tested on a larger dataset the exonic and intronic materials of two human chromosomes were studied and the tendency for disorder was assessed for any new peptide sequence arising from the translation of non-coding sequences arising from introns and exons (overlapping frames). It was shown that the tendency of disorder for protein products of newborn genes arising from introns were not inclined towards being ordered or disordered, but they can become disordered by evolution. The new exonic material created from the existing exons tends to be more disordered when translated, and this tendency does not seem to be dependent upon the disorder content of the original exons. This difference could be a consequence of the fact that the overlapping frames of coding sequences have indirectly been subjected to evolutionary pressure along with the original exon, whereas intronic sequences do not seem to have this constraint, but the exact nature of this discrepancy needs further study to be explained. The tendency of disorder in the existing new exons seems to be higher than the artificial exons (generated in this study). We conclude that the intrinsic disorder in the protein products of de novo genes is selected by the evolution rather than an initial condition. Thus, the newborn genes were not born disordered.
Intrinsic disorder in Viral Proteins Genome-Linked: experimental and predictive analyses
(BioMed Central, 2009-02-16) Hébrard, Eugénie; Bessin, Yannick; Michon, Thierry; Longhi, Sonia; Uversky, Vladimir N.; Delalande, François; Van Dorsselaer, Alain; Romero, Pedro; Walter, Jocelyne; Declerck, Nathalie; Fargette, Denis; Biochemistry and Molecular Biology, School of Medicine
Background VPgs are viral proteins linked to the 5' end of some viral genomes. Interactions between several VPgs and eukaryotic translation initiation factors eIF4Es are critical for plant infection. However, VPgs are not restricted to phytoviruses, being also involved in genome replication and protein translation of several animal viruses. To date, structural data are still limited to small picornaviral VPgs. Recently three phytoviral VPgs were shown to be natively unfolded proteins. Results In this paper, we report the bacterial expression, purification and biochemical characterization of two phytoviral VPgs, namely the VPgs of Rice yellow mottle virus (RYMV, genus Sobemovirus) and Lettuce mosaic virus (LMV, genus Potyvirus). Using far-UV circular dichroism and size exclusion chromatography, we show that RYMV and LMV VPgs are predominantly or partly unstructured in solution, respectively. Using several disorder predictors, we show that both proteins are predicted to possess disordered regions. We next extend theses results to 14 VPgs representative of the viral diversity. Disordered regions were predicted in all VPg sequences whatever the genus and the family. Conclusion Based on these results, we propose that intrinsic disorder is a common feature of VPgs. The functional role of intrinsic disorder is discussed in light of the biological roles of VPgs.
Many-to-one binding by intrinsically disordered protein regions
(WORLD SCIENTIFIC, 2019-11-02) Alterovitz, Wei-Lun; Faraggi, Eshel; Oldfield, Christopher J.; Meng, Jingwei; Xue, Bin; Huang, Fei; Romero, Pedro; Kloczkowski, Andrzej; Uversky, Vladimir N.; Dunker, A. Keith; Biochemistry and Molecular Biology, School of Medicine
Disordered binding regions (DBRs), which are embedded within intrinsically disordered proteins or regions (IDPs or IDRs), enable IDPs or IDRs to mediate multiple protein-protein interactions. DBR-protein complexes were collected from the Protein Data Bank for which two or more DBRs having different amino acid sequences bind to the same (100% sequence identical) globular protein partner, a type of interaction herein called many-to-one binding. Two distinct binding profiles were identified: independent and overlapping. For the overlapping binding profiles, the distinct DBRs interact by means of almost identical binding sites (herein called “similar”), or the binding sites contain both common and divergent interaction residues (herein called “intersecting”). Further analysis of the sequence and structural differences among these three groups indicate how IDP flexibility allows different segments to adjust to similar, intersecting, and independent binding pockets.
Secondary Structure, a Missing Component of Sequence-Based Minimotif Definitions
(Public Library of Science, 2012) Sargeant, David P.; Gryk, Michael R.; Maciejewski, Mark W.; Thapar, Vishal; Kundeti, Vamsi; Rajasekaran, Sanguthevar; Romero, Pedro; Dunker, Keith; Li, Shun-Cheng; Kaneko, Tomonori; Schiller, Martin R.; Center for Computational Biology and Bioinformatics, School of Medicine
Minimotifs are short contiguous segments of proteins that have a known biological function. The hundreds of thousands of minimotifs discovered thus far are an important part of the theoretical understanding of the specificity of protein-protein interactions, posttranslational modifications, and signal transduction that occur in cells. However, a longstanding problem is that the different abstractions of the sequence definitions do not accurately capture the specificity, despite decades of effort by many labs. We present evidence that structure is an essential component of minimotif specificity, yet is not used in minimotif definitions. Our analysis of several known minimotifs as case studies, analysis of occurrences of minimotifs in structured and disordered regions of proteins, and review of the literature support a new model for minimotif definitions that includes sequence, structure, and function.
The unfoldomics decade: an update on intrinsically disordered proteins
(BioMed Central, 2008-09-16) Dunker, A. Keith; Oldfield, Christopher J.; Meng, Jingwei; Romero, Pedro; Yang, Jack Y.; Walton Chen, Jessica; Vacic, Vladimir; Obradovic, Zoran; Uversky, Vladimir N.; Biochemistry and Molecular Biology, School of Medicine
Background Our first predictor of protein disorder was published just over a decade ago in the Proceedings of the IEEE International Conference on Neural Networks (Romero P, Obradovic Z, Kissinger C, Villafranca JE, Dunker AK (1997) Identifying disordered regions in proteins from amino acid sequence. Proceedings of the IEEE International Conference on Neural Networks, 1: 90–95). By now more than twenty other laboratory groups have joined the efforts to improve the prediction of protein disorder. While the various prediction methodologies used for protein intrinsic disorder resemble those methodologies used for secondary structure prediction, the two types of structures are entirely different. For example, the two structural classes have very different dynamic properties, with the irregular secondary structure class being much less mobile than the disorder class. The prediction of secondary structure has been useful. On the other hand, the prediction of intrinsic disorder has been revolutionary, leading to major modifications of the more than 100 year-old views relating protein structure and function. Experimentalists have been providing evidence over many decades that some proteins lack fixed structure or are disordered (or unfolded) under physiological conditions. In addition, experimentalists are also showing that, for many proteins, their functions depend on the unstructured rather than structured state; such results are in marked contrast to the greater than hundred year old views such as the lock and key hypothesis. Despite extensive data on many important examples, including disease-associated proteins, the importance of disorder for protein function has been largely ignored. Indeed, to our knowledge, current biochemistry books don't present even one acknowledged example of a disorder-dependent function, even though some reports of disorder-dependent functions are more than 50 years old. The results from genome-wide predictions of intrinsic disorder and the results from other bioinformatics studies of intrinsic disorder are demanding attention for these proteins. Results Disorder prediction has been important for showing that the relatively few experimentally characterized examples are members of a very large collection of related disordered proteins that are wide-spread over all three domains of life. Many significant biological functions are now known to depend directly on, or are importantly associated with, the unfolded or partially folded state. Here our goal is to review the key discoveries and to weave these discoveries together to support novel approaches for understanding sequence-function relationships. Conclusion Intrinsically disordered protein is common across the three domains of life, but especially common among the eukaryotic proteomes. Signaling sequences and sites of posttranslational modifications are frequently, or very likely most often, located within regions of intrinsic disorder. Disorder-to-order transitions are coupled with the adoption of different structures with different partners. Also, the flexibility of intrinsic disorder helps different disordered regions to bind to a common binding site on a common partner. Such capacity for binding diversity plays important roles in both protein-protein interaction networks and likely also in gene regulation networks. Such disorder-based signaling is further modulated in multicellular eukaryotes by alternative splicing, for which such splicing events map to regions of disorder much more often than to regions of structure. Associating alternative splicing with disorder rather than structure alleviates theoretical and experimentally observed problems associated with the folding of different length, isomeric amino acid sequences. The combination of disorder and alternative splicing is proposed to provide a mechanism for easily "trying out" different signaling pathways, thereby providing the mechanism for generating signaling diversity and enabling the evolution of cell differentiation and multicellularity. Finally, several recent small molecules of interest as potential drugs have been shown to act by blocking protein-protein interactions based on intrinsic disorder of one of the partners. Study of these examples has led to a new approach for drug discovery, and bioinformatics analysis of the human proteome suggests that various disease-associated proteins are very rich in such disorder-based drug discovery targets.

Browsing by Author "Romero, Pedro"

Results Per Page

Sort Options