Swift: An R package for novel, conserved transcript discovery

If you need an accessible version of this item, please submit a remediation request.
Date
Embargo Lift Date
Department
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Abstract

Recent developments in short read RNA sequencing technologies have enabled transcriptome wide analysis of both modal and non-modal organisms. However, transcriptomic annotations resulting from short read technologies have frequently lead to several bioinformatics challenges such as mis-assembly of transcripts, poor mapping of reads as well as mis- and incomplete annotation of the transcribed regions. Hence, there exists a critical gap in improving the transcriptomic annotations in both human and most model organisms. Fourth generation single molecule sequencing technologies such as nanopore and pacbio sequencing enable the discovery of full-length isoforms including novel transcribed regions. However, RNA-seq data from these platforms are usually available in an unprocessed form for which there is a lack of tools for discovering conserved sequences between species and the degree to which that sequence is presently annotated. To address this need and to obtain a comprehensive analysis of these newly discovered regions from single molecule long read sequencing datasets, we developed Swift, an R package for querying NCBI databases to determine sequence annotation, novelty and conservation across species. Swift uses a collection of functions to take full advantage of NCBl's local blast program to query a locally installed database. Swift extracts the unmapped regions from the BAM file and generate a FASTA file to be used with BLAST. The user can also directly supply a FASTA file already containing unmapped reads as an additional input option, to generate a comprehensive report for the newly identified transcripts in a given experiment. The user can specify the databases to search against, the number of species a sequence must be present in, the percent similarity of the submitted sequences in reference to the BLAST query sequence, and any known annotations for retrieved sequences. Several output files are generated after analysis is complete to provide a report from the tools analysis.

Description
Digitized for IUPUI ScholarWorks inclusion in 2021.
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
ISSN
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
Source
Alternative Title
Type
Poster
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Full Text Available at
This item is under embargo {{howLong}}