Direct RNA-sequencing of human cell lines for transcriptome-wide mapping and annotation of 3' tails at single molecule resolution

Date
Embargo Lift Date
Department
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Abstract

The 3' endonucleolytic cleavage of pre-messenger RNA (pre-mRNA) and successive polyadenylation is a fundamental cellular process in eukaryotes. The 3' terminal regions are known to be polyadenylated by canonical poly(A) polymerases during RNA processing of messenger RNA (mRNA) molecules, however, they are also known to harbor additional UnMapped Regions (UMR) composed of uridylation and guanylation[1]. Although short read sequencing technologies are extensively used to study 3' terminal regions, major limitations of these approaches include their inability to detect homopolymeric sequences and sequence full length isoforms [1-2]. Nanopore sequencing enables the long read sequencing and identification of full length transcripts at a single molecule resolution, however currently there are no tools for systematically analyzing 3' terminal UMRs from direct RNA-sequencing datasets. Here, we present RAPTOR (https://github.com/aniram118/RAPTOR), a command line tool for 3' terminal UMR analysis of nanopore direct RNA sequencing data. RAPTOR provides a comprehensive report of UMR sequence information, cognate transcript annotations, nucleotide base composition, conserved hexamer signals and a range of analyses plots at a single molecule resolution. For benchmarking, we sequenced mRNA samples obtained from HepG2 (Liver Hepatocellular Carcinoma) & K562 (Bone Marrow Chronic myelogenous leukemia) cell lines resulting in 243,802 & 598,428 reads respectively. RAPTOR identified high quality UMRs, exhibited median lengths of 201 and 173 nt in HepG2 and K562 transcriptomes respectively. Nucleotide composition analysis of the identified 3' UMRs showed an enrichment for A and U nucleotides in both HepG2 [A: 29%, U: 28%, G:20%, C:23%] and K562 [A : 30%, U: 29%, G:1 9%, C:22%] cells. Several high confidence UMRs were verified by qPCR and sanger sequencing confirming sequence length and identity, respectively. In addition, denovo motif analysis of UMR regions enabled the discovery of several noncanonical motifs beyond Poly A/U patterns. These UMR motifs were identified to be significantly - (p-value <0.01) associated with the established binding motifs of several known RNA Binding Proteins including SART3, HuR (ELAVL1), TIA1 , IGF2BP2/3, PABPCs, PCBPs, SRSFs, HNRNPs and RBM /6, suggesting an unappreciated role of these RBPs in binding to 3' tails of mRNAs.

Description
Digitized for IUPUI ScholarWorks inclusion in 2021.
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
ISSN
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
Rights
Source
Alternative Title
Type
Poster
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Full Text Available at
This item is under embargo {{howLong}}