Swift: An R package for novel, conserved transcript discovery

Howard, Morgan; Quoseena, Mir; Janga, Sarath Chandra

Swift: An R package for novel, conserved transcript discovery

dc.contributor.author	Howard, Morgan
dc.contributor.author	Quoseena, Mir
dc.contributor.author	Janga, Sarath Chandra
dc.date.accessioned	2022-04-15T20:39:58Z
dc.date.available	2022-04-15T20:39:58Z
dc.description	Digitized for IUPUI ScholarWorks inclusion in 2021.
dc.description.abstract	Recent developments in short read RNA sequencing technologies have enabled transcriptome wide analysis of both modal and non-modal organisms. However, transcriptomic annotations resulting from short read technologies have frequently lead to several bioinformatics challenges such as mis-assembly of transcripts, poor mapping of reads as well as mis- and incomplete annotation of the transcribed regions. Hence, there exists a critical gap in improving the transcriptomic annotations in both human and most model organisms. Fourth generation single molecule sequencing technologies such as nanopore and pacbio sequencing enable the discovery of full-length isoforms including novel transcribed regions. However, RNA-seq data from these platforms are usually available in an unprocessed form for which there is a lack of tools for discovering conserved sequences between species and the degree to which that sequence is presently annotated. To address this need and to obtain a comprehensive analysis of these newly discovered regions from single molecule long read sequencing datasets, we developed Swift, an R package for querying NCBI databases to determine sequence annotation, novelty and conservation across species. Swift uses a collection of functions to take full advantage of NCBl's local blast program to query a locally installed database. Swift extracts the unmapped regions from the BAM file and generate a FASTA file to be used with BLAST. The user can also directly supply a FASTA file already containing unmapped reads as an additional input option, to generate a comprehensive report for the newly identified transcripts in a given experiment. The user can specify the databases to search against, the number of species a sequence must be present in, the percent similarity of the submitted sequences in reference to the BLAST query sequence, and any known annotations for retrieved sequences. Several output files are generated after analysis is complete to provide a report from the tools analysis.	en_US
dc.identifier.uri	https://hdl.handle.net/1805/28535
dc.title	Swift: An R package for novel, conserved transcript discovery	en_US
dc.type	Poster	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: BISP-Howard&Quoseena-OCR.pdf
Size:: 648.38 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.99 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Department of Biomedical Engineering and Informatics Works
Sarath Janga