- Browse by Author
Informatics School Theses and Dissertations
Permanent URI for this collection
Please go to "Informatics Graduate Theses and PhD Dissertations" to submit dissertations and theses for the School of Informatics and Computing, at: http://hdl.handle.net/1805/303.
Browse
Browsing Informatics School Theses and Dissertations by Author "Badve, Abhijit"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
Item Discovery and evolutionary dynamics of RBPs and circular RNAs in mammalian transcriptomes(2015-03-30) Badve, Abhijit; Janga, Sarath Chandra; Hahn, Matthew William; Liu, YunlongRNA-binding proteins (RBPs) are vital post-transcriptional regulatory molecules in transcriptome of mammalian species. It necessitates studying their expression dynamics to extract how post-transcriptional networks work in various mammalian tissues. RNA binding proteins (RBPs) play important roles in controlling the post-transcriptional fate of RNA molecules, yet their evolutionary dynamics remains largely unknown. As expression profiles of genes encoding for RBPs can yield insights about their evolutionary trajectories on the post-transcriptional regulatory networks across species, we performed a comparative analyses of RBP expression profiles across 8 tissues (brain, cerebellum, heart, lung, liver, lung, skeletal muscle, testis) in 11 mammals (human, chimpanzee, gorilla, orangutan, macaque, rat, mouse, platypus, opossum, cow) and chicken & frog (evolutionary outgroups). Noticeably, orthologous gene expression profiles suggest a significantly higher expression level for RBPs than their non-RBP gene counterparts, which include other protein-coding and non-coding genes, across all the mammalian tissues studied here. This trend is significant irrespective of the tissue and species being compared, though RBP gene expression distribution patterns were found to be generally diverse in nature. Our analysis also shows that RBPs are expressed at a significantly lower level in human and mouse tissues compared to their expression levels in equivalent tissues in other mammals: chimpanzee, orangutan, rat, etc., which are all likely exposed to diverse natural habitats and ecological settings compared to more stable ecological environment humans and mice might have been exposed, thus reducing the need for complex and extensive post-transcriptional control. Further analysis of the similarity of orthologous RBP expression profiles between all pairs of tissue-mammal combinations clearly showed the grouping of RBP expression profiles across tissues in a given mammal, in contrast to the clustering of expression profiles for non-RBPs, which frequently grouped equivalent tissues across diverse mammalian species together, suggesting a significant evolution of RBPs expression after speciation events. Calculation of species specificity indices (SSIs) for RBPs across various tissues, to identify those that exhibited restricted expression to few mammals, revealed that about 30% of the RBPs are species-specific in at least one tissue studied here, with lung, liver, kidney & testis exhibiting a significantly higher proportion of species specifically expressed RBPs. We conducted a differential expression analysis of RBPs in human, mouse and chicken tissues to study the evolution of expression levels in recently evolved species (i.e., humans and mice) than evolutionarily-distant species (i.e., chickens). We identified more than 50% of the orthologous RBPs to be differentially expressed in at least one tissue, compared between human and mouse, but not so between human and an outgroup chicken, in which RBP expression levels are relatively conserved. Among the studied tissues (brain, liver and kidney) showed a higher fraction of differentially expressed RBPs, which may suggest hyper- regulatory activities by RBPs in these tissues with species evolution. Overall, this study forms a foundation for understanding the evolution of expression levels of RBPs in mammals, facilitating a snapshot of the wiring patterns of post-transcriptional regulatory networks in mammalian genomes. In our second study, we focused on elucidating novel features of post-transcriptional regulatory molecules called as circRNA from LongPolyA RNA-sequence data. The debate over presence of nonlinear exon splicing such as exon-shuffling or formation of circularized forms has finally come to an end as numerous repertoires have shown of their occurrence and presence through transcriptomic analyses. It is evident from previous studies that along with consensus-site splicing non-consensus site splicing is robustly occurring in the cell. Also, in spite of applying different high-throughput approaches (both computational and experimental) to determine their abundance, the signal is consistent and strongly conforming the plausible circularization mechanisms. Earlier studies hypothesized and hence focused on the ribo-minus non-polyA RNA-sequence data to identify circular RNA structures in cell and compared their abundance levels with their linear counterparts. Thus far, the studies show their conserved nature across tissues and species also that they are not translated and preferentially are without poly (A) tail, with one to five exons long. Much of this initial work has been performed using non-polyA sequencing thus probably underestimates the abundance of circular RNAs originating from long poly (A) RNA isoforms. Our hypothesis is if the circular RNA events are not the artifact of random events, but has a structured and defined mechanism for their formation, then there would not be biases on preferential selection / leaving of polyA tails, while forming the circularized isoforms. We have applied an existing computational pipeline from earlier studies by Memczack et. al., on ENCODE cell-lines long poly (A) RNA-sequence data. With the same pipeline, we achieve a significant number of circular RNA isoforms in the data, some of which are overlapping with known circular RNA isoforms from the literature. We identified an approach and worked upon to identify the precise structure of circular RNA, which is not plausible from the existing computational approaches. We aim to study their expression profiles in normal and cancer cell-lines, and see if there exists any pattern and functional significance based on their abundance levels in the cell.