Predictions on and Analysis of Viral Proteins Encoded by Overlapping Genes
Date
Authors
Embargo Lift Date
Department
Committee Chair
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Abstract
Overlapping genes are adjacent genes that share a portion of their coding sequence. Such genes are often observed in the compact genomes of viruses, prokaryotes,and mitochondria. Overlapping genes are also seen in human and other mammalian genomes. Gene overlapping is a phenomenon to minimize genomic size and maximize encoding capacity. Overlapping genes produce different proteins. A major task in the post genomic era is the large-scale study of the structures and functions of proteins. Proteins play crucial roles in virtually all biological processes. In general it is assumed that 3-D structure determines the function of proteins, but many proteins or region of proteins may function in the absence of 3-D structure. The term disordered is used to describe these proteins. A large number of studies has shown that biological functions depend on both ordered and disordered proteins. Natively disordered regions are common and play essential roles in many proteins, especially, with regard to activities involved in signaling and regulation.
The goal of this research was the analysis of the ordered and disordered tendencies of viral proteins encoded by overlapping genes. Our hypothesis is that, in a pair of proteins or protein regions encoded by overlapping genes, at least one of the pair is disordered (or unstructured). Our hypothesis is based on the observation that structural proteins require highly specific amino acid sequences, while unstructured (disordered) sequences are essentially unconstrained. Thus, given a structural protein and its associated mRNA sequence, any sequence derived from an overlapping reading frame seems highly unlikely to have a sequence pattern commensurate with a structural protein; on the other hand, a sequence pattern consistent with a disordered protein seems much more likely. We performed studies on the protein products of overlapping gene sequences, tested the hypothesis and addressed the following two questions: First do the proteins encoded by overlapping genes have opposite order-disorder content, that is, does the ordered part of one of the overlapping proteins correspond to a disordered part in the other overlapping protein? Second, does the encoded protein in the overlapping regions have more disordered amino acids than the non-overlapping regions? Using our database of overlapping viral genes and the protein predictor PONDR VL3, we predicted the order-disorder of amino acids in the sequence of 97 viral protein samples. An analysis of the results supported our hypothesis and indicated that the ordered amino acids are mostly associated with non-overlapping regions while disordered amino acids are more prevalent in overlapping regions. In the overlapping regions for 52 protein pairs, we showed that most of the amino acid pairs facing each other on the protein sequences had at least one disorder for most cases. Out of 52 pairs, there were 3 protein pairs where there were no disordered amino acids and 22 protein pairs where there were no ordered amino acids on either sequence. The fraction of ordered pairs in the pool of overlapping regions of 52 protein pairs was 0.28. The non-overlapping region of 97 proteins had predominantly ordered proteins. The fraction of ordered amino acids in the pool of non-overlapping regions was determined to be 0.77.