- Browse by Author
Browsing by Author "Quoseena, Mir"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Alternative Splicing Profile Comparison of Differentiating I-helper Cells to Dissect the Splicing Signatures of Th1, Th2, 1h17 and Treg CellsLakshmipati, Deepak Kumar; Quoseena, Mir; Ulrich, Benjamin; Kaplan, Mark; Janga, Sarath ChandraThis study focuses on the contribution of Alternative Splicing (AS) events in the differentiation and post-differentiation functions of T-helper cells, specifically in Thl, Th2, Th9, Th17 and Treg cells. T cell RNA-seq data from 72hr and 2week post differentiation time points was analyzed using (r-MATS) for alternative splicing events. We observed majority of the significant events are Skipped Exon (SE) events originating from a total of 1,556 genes and lntron Retention (RI) events were the second most abundant event occurring in 1,254 genes at 72 hours post differentiation. These numbers were significantly lower at 2 weeks post differentiation. PCR and qPCR validations confirmed scores of novel splicing event predictions. Results showed several skipped exon (SE) events in KTNl, IL4RA IL27, Hnrmpd, CREM and Arid4b showing different mRNA isoforms across multiple naïve vs differentiated T cell combinations. Overall, RI event associated genes were more prevalent (3,239 genes) than those exhibiting SE (2810 genes). SE events were associated with 10.8% (Th17), 11.2% (Treg), 12.1% (Th2) and 13.9% (Thl) of the genes, a similar trend was observed with RI events with a prevalence of 12.2% (Th17), 12.5% (Treg), 14.2% (Thl) and 14.4% (Th2) of the genes. Gene ontology results showed most of the genes showing SE and RI events are involved in processes like 'mRNA Processing', 'RNA Processing' and 'RNA Binding' and ontology results for retained introns also showed p53 suppression proteins, regulated exocytosis of neurotransmitters and hormones. It was also observed that Introns consistently favored retention at the 3' end of the gene than the 5', with 430 genes showing intron retention events at the 3' end and 21 genes exhibiting them at the 5' end, for the 72 hour time point. Enriched functional ontologies were consistently seen across all cell types to be exclusive for the genes showing RI in the 5' end vs the 3' end.Item Direct RNA-sequencing of human cell lines for transcriptome-wide mapping and annotation of 3' tails at single molecule resolutionGovindaraman, Aniruddhan; Quoseena, Mir; Kadumuri, Raja Shekar Vanna; Srivastava, Mansi; Srivastava, Rajneesh; Janga, Sarath ChandraThe 3' endonucleolytic cleavage of pre-messenger RNA (pre-mRNA) and successive polyadenylation is a fundamental cellular process in eukaryotes. The 3' terminal regions are known to be polyadenylated by canonical poly(A) polymerases during RNA processing of messenger RNA (mRNA) molecules, however, they are also known to harbor additional UnMapped Regions (UMR) composed of uridylation and guanylation[1]. Although short read sequencing technologies are extensively used to study 3' terminal regions, major limitations of these approaches include their inability to detect homopolymeric sequences and sequence full length isoforms [1-2]. Nanopore sequencing enables the long read sequencing and identification of full length transcripts at a single molecule resolution, however currently there are no tools for systematically analyzing 3' terminal UMRs from direct RNA-sequencing datasets. Here, we present RAPTOR (https://github.com/aniram118/RAPTOR), a command line tool for 3' terminal UMR analysis of nanopore direct RNA sequencing data. RAPTOR provides a comprehensive report of UMR sequence information, cognate transcript annotations, nucleotide base composition, conserved hexamer signals and a range of analyses plots at a single molecule resolution. For benchmarking, we sequenced mRNA samples obtained from HepG2 (Liver Hepatocellular Carcinoma) & K562 (Bone Marrow Chronic myelogenous leukemia) cell lines resulting in 243,802 & 598,428 reads respectively. RAPTOR identified high quality UMRs, exhibited median lengths of 201 and 173 nt in HepG2 and K562 transcriptomes respectively. Nucleotide composition analysis of the identified 3' UMRs showed an enrichment for A and U nucleotides in both HepG2 [A: 29%, U: 28%, G:20%, C:23%] and K562 [A : 30%, U: 29%, G:1 9%, C:22%] cells. Several high confidence UMRs were verified by qPCR and sanger sequencing confirming sequence length and identity, respectively. In addition, denovo motif analysis of UMR regions enabled the discovery of several noncanonical motifs beyond Poly A/U patterns. These UMR motifs were identified to be significantly - (p-value <0.01) associated with the established binding motifs of several known RNA Binding Proteins including SART3, HuR (ELAVL1), TIA1 , IGF2BP2/3, PABPCs, PCBPs, SRSFs, HNRNPs and RBM /6, suggesting an unappreciated role of these RBPs in binding to 3' tails of mRNAs.Item R(A)PTOR -A tool for systematic identification of Poly(A) tails and 3'unmapped regions from single molecule direct RNA-sequencing datasetsGovindaraman, Aniruddhan; Kadumuri, Raja Shekar Varma; Quoseena, Mir; Janga, Sarath ChandraThe 3' cleavage of pre-messenger RNA (mRNA) and successive polyadenylation is a fundamental cellular process in eukaryotes. Studies report poly-A tail as a long chain of adenine nucleotides added during RNA processing to 3' terminal of a messenger RNA (mRNA) molecule, however, the terminal 3' region is known to harbor additional unmappable regions (UMR) composed of uridylation and guanylation [1]. Although short read sequencing technologies are extensively used for study of 3' terminal poly(A) regions, the major drawback of third generation sequencing technologies lies in their inability to detect full length homopolymeric sequences [1] [2]. Recent long read sequencing technologies like Nanopore sequencing enable sequencing of full length transcripts at a single molecule resolution, however currently there are no tools for systematically analyzing 3' terminal unmapped regions from direct RNA-sequencing datasets. We present RAPTOR (https://github.com/aniram118/RAPTOR), a command line tool for 3' terminal unmapped region analysis of nanopore direct RNA sequencing data. RAPTOR provides a comprehensive report of UMR length, sequences, conserved polyA hexamer regions, nucleotide base composition and transcript vs UMR length correlation analysis at a single molecule resolution. In our benchmarking studies, we sequenced mRNA samples obtained from HepG2(Liver Hepatocellular Carcinoma) & K562(Bone Marrow Chronic myelogenous leukemia) cell lines resulting in 243,802 & 598,428 reads respectively. RAPTOR identified UMRs exhibited a median length of 50-100 nt, in agreement with previous studies [1].Our results also support an enrichment of previously known conserved polyA hexamers [3]. Nucleotide composition analysis of the identified 3' UMR regions showed an enrichment for A and U nucleotides in both HepG2 [A : 29%, U: 28%, G:20%, C:23% ] and K562 [A : 30%, U: 29%, G:19%, C:22%] and interestingly, guanylation was observed in upstream and downstream regions of UMR while uridylation was found to occur more in central regions, suggesting their characteristic role in mRNA stability. In addition, conserved motif analysis of UMR regions followed by RBP binding site analysis, identified several RBPs including HNRPK, PCB2, SART SRSF9, HNRPR and RBM4 to be enriched in the unmapped regions, suggestive of an unappreciated role of these RBPs in binding to 3' tails of mRNAs.Item Swift: An R package for novel, conserved transcript discoveryHoward, Morgan; Quoseena, Mir; Janga, Sarath ChandraRecent developments in short read RNA sequencing technologies have enabled transcriptome wide analysis of both modal and non-modal organisms. However, transcriptomic annotations resulting from short read technologies have frequently lead to several bioinformatics challenges such as mis-assembly of transcripts, poor mapping of reads as well as mis- and incomplete annotation of the transcribed regions. Hence, there exists a critical gap in improving the transcriptomic annotations in both human and most model organisms. Fourth generation single molecule sequencing technologies such as nanopore and pacbio sequencing enable the discovery of full-length isoforms including novel transcribed regions. However, RNA-seq data from these platforms are usually available in an unprocessed form for which there is a lack of tools for discovering conserved sequences between species and the degree to which that sequence is presently annotated. To address this need and to obtain a comprehensive analysis of these newly discovered regions from single molecule long read sequencing datasets, we developed Swift, an R package for querying NCBI databases to determine sequence annotation, novelty and conservation across species. Swift uses a collection of functions to take full advantage of NCBl's local blast program to query a locally installed database. Swift extracts the unmapped regions from the BAM file and generate a FASTA file to be used with BLAST. The user can also directly supply a FASTA file already containing unmapped reads as an additional input option, to generate a comprehensive report for the newly identified transcripts in a given experiment. The user can specify the databases to search against, the number of species a sequence must be present in, the percent similarity of the submitted sequences in reference to the BLAST query sequence, and any known annotations for retrieved sequences. Several output files are generated after analysis is complete to provide a report from the tools analysis.