R(A)PTOR -A tool for systematic identification of Poly(A) tails and 3'unmapped regions from single molecule direct RNA-sequencing datasets

dc.contributor.authorGovindaraman, Aniruddhan
dc.contributor.authorKadumuri, Raja Shekar Varma
dc.contributor.authorQuoseena, Mir
dc.contributor.authorJanga, Sarath Chandra
dc.date.accessioned2022-04-15T20:34:54Z
dc.date.available2022-04-15T20:34:54Z
dc.descriptionDigitized for IUPUI ScholarWorks inclusion in 2021.
dc.description.abstractThe 3' cleavage of pre-messenger RNA (mRNA) and successive polyadenylation is a fundamental cellular process in eukaryotes. Studies report poly-A tail as a long chain of adenine nucleotides added during RNA processing to 3' terminal of a messenger RNA (mRNA) molecule, however, the terminal 3' region is known to harbor additional unmappable regions (UMR) composed of uridylation and guanylation [1]. Although short read sequencing technologies are extensively used for study of 3' terminal poly(A) regions, the major drawback of third generation sequencing technologies lies in their inability to detect full length homopolymeric sequences [1] [2]. Recent long read sequencing technologies like Nanopore sequencing enable sequencing of full length transcripts at a single molecule resolution, however currently there are no tools for systematically analyzing 3' terminal unmapped regions from direct RNA-sequencing datasets. We present RAPTOR (https://github.com/aniram118/RAPTOR), a command line tool for 3' terminal unmapped region analysis of nanopore direct RNA sequencing data. RAPTOR provides a comprehensive report of UMR length, sequences, conserved polyA hexamer regions, nucleotide base composition and transcript vs UMR length correlation analysis at a single molecule resolution. In our benchmarking studies, we sequenced mRNA samples obtained from HepG2(Liver Hepatocellular Carcinoma) & K562(Bone Marrow Chronic myelogenous leukemia) cell lines resulting in 243,802 & 598,428 reads respectively. RAPTOR identified UMRs exhibited a median length of 50-100 nt, in agreement with previous studies [1].Our results also support an enrichment of previously known conserved polyA hexamers [3]. Nucleotide composition analysis of the identified 3' UMR regions showed an enrichment for A and U nucleotides in both HepG2 [A : 29%, U: 28%, G:20%, C:23% ] and K562 [A : 30%, U: 29%, G:19%, C:22%] and interestingly, guanylation was observed in upstream and downstream regions of UMR while uridylation was found to occur more in central regions, suggesting their characteristic role in mRNA stability. In addition, conserved motif analysis of UMR regions followed by RBP binding site analysis, identified several RBPs including HNRPK, PCB2, SART SRSF9, HNRPR and RBM4 to be enriched in the unmapped regions, suggestive of an unappreciated role of these RBPs in binding to 3' tails of mRNAs.en_US
dc.identifier.urihttps://hdl.handle.net/1805/28531
dc.titleR(A)PTOR -A tool for systematic identification of Poly(A) tails and 3'unmapped regions from single molecule direct RNA-sequencing datasetsen_US
dc.typePosteren_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
BISP-Govindaraman&Kadumuri-OCR.pdf
Size:
1.79 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: