Swift: An R package for novel, conserved transcript discovery

dc.contributor.authorHoward, Morgan
dc.contributor.authorQuoseena, Mir
dc.contributor.authorJanga, Sarath Chandra
dc.date.accessioned2022-04-15T20:39:58Z
dc.date.available2022-04-15T20:39:58Z
dc.descriptionDigitized for IUPUI ScholarWorks inclusion in 2021.
dc.description.abstractRecent developments in short read RNA sequencing technologies have enabled transcriptome wide analysis of both modal and non-modal organisms. However, transcriptomic annotations resulting from short read technologies have frequently lead to several bioinformatics challenges such as mis-assembly of transcripts, poor mapping of reads as well as mis- and incomplete annotation of the transcribed regions. Hence, there exists a critical gap in improving the transcriptomic annotations in both human and most model organisms. Fourth generation single molecule sequencing technologies such as nanopore and pacbio sequencing enable the discovery of full-length isoforms including novel transcribed regions. However, RNA-seq data from these platforms are usually available in an unprocessed form for which there is a lack of tools for discovering conserved sequences between species and the degree to which that sequence is presently annotated. To address this need and to obtain a comprehensive analysis of these newly discovered regions from single molecule long read sequencing datasets, we developed Swift, an R package for querying NCBI databases to determine sequence annotation, novelty and conservation across species. Swift uses a collection of functions to take full advantage of NCBl's local blast program to query a locally installed database. Swift extracts the unmapped regions from the BAM file and generate a FASTA file to be used with BLAST. The user can also directly supply a FASTA file already containing unmapped reads as an additional input option, to generate a comprehensive report for the newly identified transcripts in a given experiment. The user can specify the databases to search against, the number of species a sequence must be present in, the percent similarity of the submitted sequences in reference to the BLAST query sequence, and any known annotations for retrieved sequences. Several output files are generated after analysis is complete to provide a report from the tools analysis.en_US
dc.identifier.urihttps://hdl.handle.net/1805/28535
dc.titleSwift: An R package for novel, conserved transcript discoveryen_US
dc.typePosteren_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
BISP-Howard&Quoseena-OCR.pdf
Size:
648.38 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: