Govindaraman, AniruddhanQuoseena, MirKadumuri, Raja Shekar VannaSrivastava, MansiSrivastava, RajneeshJanga, Sarath Chandra2022-04-152022-04-15https://hdl.handle.net/1805/28532Digitized for IUPUI ScholarWorks inclusion in 2021.The 3' endonucleolytic cleavage of pre-messenger RNA (pre-mRNA) and successive polyadenylation is a fundamental cellular process in eukaryotes. The 3' terminal regions are known to be polyadenylated by canonical poly(A) polymerases during RNA processing of messenger RNA (mRNA) molecules, however, they are also known to harbor additional UnMapped Regions (UMR) composed of uridylation and guanylation[1]. Although short read sequencing technologies are extensively used to study 3' terminal regions, major limitations of these approaches include their inability to detect homopolymeric sequences and sequence full length isoforms [1-2]. Nanopore sequencing enables the long read sequencing and identification of full length transcripts at a single molecule resolution, however currently there are no tools for systematically analyzing 3' terminal UMRs from direct RNA-sequencing datasets. Here, we present RAPTOR (https://github.com/aniram118/RAPTOR), a command line tool for 3' terminal UMR analysis of nanopore direct RNA sequencing data. RAPTOR provides a comprehensive report of UMR sequence information, cognate transcript annotations, nucleotide base composition, conserved hexamer signals and a range of analyses plots at a single molecule resolution. For benchmarking, we sequenced mRNA samples obtained from HepG2 (Liver Hepatocellular Carcinoma) & K562 (Bone Marrow Chronic myelogenous leukemia) cell lines resulting in 243,802 & 598,428 reads respectively. RAPTOR identified high quality UMRs, exhibited median lengths of 201 and 173 nt in HepG2 and K562 transcriptomes respectively. Nucleotide composition analysis of the identified 3' UMRs showed an enrichment for A and U nucleotides in both HepG2 [A: 29%, U: 28%, G:20%, C:23%] and K562 [A : 30%, U: 29%, G:1 9%, C:22%] cells. Several high confidence UMRs were verified by qPCR and sanger sequencing confirming sequence length and identity, respectively. In addition, denovo motif analysis of UMR regions enabled the discovery of several noncanonical motifs beyond Poly A/U patterns. These UMR motifs were identified to be significantly - (p-value <0.01) associated with the established binding motifs of several known RNA Binding Proteins including SART3, HuR (ELAVL1), TIA1 , IGF2BP2/3, PABPCs, PCBPs, SRSFs, HNRNPs and RBM /6, suggesting an unappreciated role of these RBPs in binding to 3' tails of mRNAs.Direct RNA-sequencing of human cell lines for transcriptome-wide mapping and annotation of 3' tails at single molecule resolutionPoster