Prediction and Evolutionary Analysis of RNA Binding Proteins Across Eukaryotic Genomes

dc.contributor.authorHassan, Huzaifa
dc.contributor.authorJanga, Sarath Chandra
dc.date.accessioned2022-04-15T20:39:19Z
dc.date.available2022-04-15T20:39:19Z
dc.descriptionDigitized for IUPUI ScholarWorks inclusion in 2021.
dc.description.abstractRNA Binding Proteins (RBPs) are key players in several post transcriptional regulatory mechanisms and mediate the metabolism of RNA in the cell. High throughput technologies such as cross-linking followed by Mass Spectrometry (MS) have led to the identification of large number of RBPs and RNA binding domains (RBDs) encoded by them. Although experimental methods have increased the repertoire of RBPs in model systems, the complete repertoire of RBPs across eukaryotic species is far from complete. In this study, we developed a computational pipeline to predict RNA binding proteins using RNA binding domains and protein homology information. Our approach involved, using peptides which can bind to RNA from 529 RBPs and a dataset of 1344 experimentally known human RBPs as a reference set. Domain based predictions using HMMER were integrated with homology information to get an integrated genome wide prediction of RBPs across 69 species. Benchmarking of these predictions against mouse genes annotated as RBPs resulted in a precision of 60% and recall of 75%. An average of 1750 RBPs were identified across eukaryotes comprising of mammals, birds, amphibians, insects and worms. Although RBPs were found to be highly conserved across the phylogenetic spectrum, few lower order species such as lamprey, Caenorhabditis elegans and yeast exhibited fewer RBPs encoded in their genomes, suggestive of the divergence of RBP repertoire in distant relatives. In contrast to Transcription Factors (TFs) and kinases, genes encoding for RBPs exhibited an increase in their number (p-value: 0.0013) with increase in genome size. Although majority (56%) of the RNA binding regions could be mapped to the domains present in the Pfam database, a small fraction of the unmapped novel domains were detected in > 1 % of protein coding genes analyzed across genomes. A co-occurrence network of RBDs revealed prominent enrichment of Nup160, WD40 and RRM domains with other RBDs across eukaryotic genomes. Our proposed prediction pipeline and corresponding repertoire of RBPs would stand as a valuable resource for studying post transcriptional regulatory networks across eukaryotic species.en_US
dc.identifier.urihttps://hdl.handle.net/1805/28534
dc.titlePrediction and Evolutionary Analysis of RNA Binding Proteins Across Eukaryotic Genomesen_US
dc.typePosteren_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
BISP-Hassan&Janga-OCR.pdf
Size:
1.66 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: