- Browse by Author
Browsing by Author "Zhu, Daming"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item IsoTree: A New Framework for De novo Transcriptome Assembly from RNA-seq Reads(IEEE, 2018-02) Zhao, Jin; Feng, Haodi; Zhu, Daming; Zhang, Chi; Xu, Ying; Medical and Molecular Genetics, School of MedicineHigh-throughput sequencing of mRNA has made the deep and efficient probing of transcriptome more affordable. However, the vast amounts of short RNA-seq reads make de novo transcriptome assembly an algorithmic challenge. In this work, we present IsoTree, a novel framework for transcripts reconstruction in the absence of reference genomes. Unlike most of de novo assembly methods that build de Bruijn graph or splicing graph by connecting $k-mers$ which are sets of overlapping substrings generated from reads, IsoTree constructs splicing graph by connecting reads directly. For each splicing graph, IsoTree applies an iterative scheme of mixed integer linear program to build a prefix tree, called isoform tree. Each path from the root node of the isoform tree to a leaf node represents a plausible transcript candidate which will be pruned based on the information of paired-end reads. Experiments showed that in most cases IsoTree performs better than other leading transcriptome assembly programs. IsoTree is available at https://github.com/Jane110111107/IsoTree.Item The Longest Common Exemplar Subsequence Problem(IEEE, 2018-12) Zhang, Shu; Wang, Ruizhi; Zhu, Daming; Jiang, Haitao; Feng, Haodi; Guo, Jiong; Liu, Xiaowen; BioHealth Informatics, School of Informatics and ComputingIn this paper, we propose to find order conserved subsequences of genomes by finding longest common exemplar subsequences of the genomes. The longest common exemplar subsequence problem is given by two genomes, asks to find a common exemplar subsequence of them, such that the exemplar subsequence length is maximized. We focus on genomes whose genes of the same gene family are in at most s spans. We propose a dynamic programming algorithm with time complexity O(s4 s mn) to find a longest common exemplar subsequence of two genomes with one genome admitting s span genes of the same gene family, where m, n stand for the gene numbers of those two given genomes. Our algorithm can be extended to find longest common exemplar subsequences of more than one genomes.Item A Spectrum Graph-Based Protein Sequence Filtering Algorithm for Proteoform Identification by Top-Down Mass Spectrometry(IEEE, 2017-11) Yang, Runmin; Zhu, Daming; Kou, Qiang; Bhat-Nakshatri, Poomima; Nakshatri, Harikrishna; Wu, Si; Liu, Xiaowen; BioHealth Informatics, School of Informatics and ComputingDatabase search is the main approach for identifying proteoforms using top-down tandem mass spectra. However, it is extremely slow to align a query spectrum against all protein sequences in a large database when the target proteoform that produced the spectrum contains post-translational modifications and/or mutations. As a result, efficient and sensitive protein sequence filtering algorithms are essential for speeding up database search. In this paper, we propose a novel filtering algorithm, which generates spectrum graphs from subspectra of the query spectrum and searches them against the protein database to find good candidates. Compared with the sequence tag and gaped tag approaches, the proposed method circumvents the step of tag extraction, thus simplifying data processing. Experimental results on real data showed that the proposed method achieved both high speed and high sensitivity in protein sequence filtration.