- Browse by Author
Browsing by Author "Paša-Tolić, Ljiljana"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Characterization of proteoforms with unknown post-translational modi cations using the MIScore(ACS, 2016) Kou, Qiang; Zhu, Binhai; Wu, Si; Ansong, Charles; Tolić, Nikola; Paša-Tolić, Ljiljana; Liu, Xiaowen; Department of Biohealth Informatics, School of Informatics and ComputingVarious proteoforms may be generated from a single gene due to primary structure alterations (PSAs) such as genetic variations, alternative splicing, and post-translational modifications (PTMs). Top-down mass spectrometry is capable of analyzing intact proteins and identifying patterns of multiple PSAs, making it the method of choice for studying complex proteoforms. In top-down proteomics, proteoform identification is often performed by searching tandem mass spectra against a protein sequence database that contains only one reference protein sequence for each gene or transcript variant in a proteome. Because of the incompleteness of the protein database, an identified proteoform may contain unknown PSAs compared with the reference sequence. Proteoform characterization is to identify and localize PSAs in a proteoform. Although many software tools have been proposed for proteoform identification by top-down mass spectrometry, the characterization of proteoforms in identified proteoform–spectrum matches still relies mainly on manual annotation. We propose to use the Modification Identification Score (MIScore), which is based on Bayesian models, to automatically identify and localize PTMs in proteoforms. Experiments showed that the MIScore is accurate in identifying and localizing one or two modifications.Item A mass graph-based approach for the identification of modified proteoforms using top-down tandem mass spectra(Oxford, 2017-05-01) Kou, Qiang; Wu, Si; Tolić, Nikola; Paša-Tolić, Ljiljana; Liu, Yunlong; Liu, Xiaowen; BioHealth Informatics, School of Informatics and ComputingMotivation: Although proteomics has rapidly developed in the past decade, researchers are still in the early stage of exploring the world of complex proteoforms, which are protein products with various primary structure alterations resulting from gene mutations, alternative splicing, post-translational modifications, and other biological processes. Proteoform identification is essential to mapping proteoforms to their biological functions as well as discovering novel proteoforms and new protein functions. Top-down mass spectrometry is the method of choice for identifying complex proteoforms because it provides a 'bird's eye view' of intact proteoforms. The combinatorial explosion of various alterations on a protein may result in billions of possible proteoforms, making proteoform identification a challenging computational problem. Results: We propose a new data structure, called the mass graph, for efficient representation of proteoforms and design mass graph alignment algorithms. We developed TopMG, a mass graph-based software tool for proteoform identification by top-down mass spectrometry. Experiments on top-down mass spectrometry datasets showed that TopMG outperformed existing methods in identifying complex proteoforms.Item De Novo Sequencing of Peptides from High-Resolution Bottom-Up Tandem Mass Spectra using Top-Down Intended Methods(Wiley, 2017-12) Vyatkina, Kira; Dekker, Lennard J. M.; Wu, Si; VanDuijn, Martijn M.; Liu, Xiaowen; Tolić, Nikola; Luider, Theo M.; Paša-Tolić, Ljiljana; BioHealth Informatics, School of Informatics and ComputingDespite high-resolution mass spectrometers are becoming accessible for more and more laboratories, tandem (MS/MS) mass spectra are still often collected at a low resolution. And even if acquired at a high resolution, software tools used for their processing do not tend to benefit from that in full, and an ability to specify a relative mass tolerance in this case often remains the only feature the respective algorithms take advantage of. We argue that a more efficient way to analyze high-resolution MS/MS spectra should be with methods more explicitly accounting for the precision level, and sustain this claim through demonstrating that a de novo sequencing framework originally developed for (high-resolution) top-down MS/MS data is perfectly suitable for processing high-resolution bottom-up datasets, even though a top-down like deconvolution performed as the first step will leave in many spectra at most a few peaks.Item Top-down analysis of protein samples by de novo sequencing techniques(Oxford, 2016-09) Vyatkina, Kira; Wu, Si; Dekker, Lennard J. M.; VanDuijn, Martijn M.; Liu, Xiaowen; Tolić, Nikola; Luider, Theo M.; Paša-Tolić, Ljiljana; Pevzner, Pavel A.; Department of Biohealth Informatics, School of Informatics and ComputingMotivation: Recent technological advances have made high-resolution mass spectrometers affordable to many laboratories, thus boosting rapid development of top-down mass spectrometry, and implying a need in efficient methods for analyzing this kind of data. Results: We describe a method for analysis of protein samples from top-down tandem mass spectrometry data, which capitalizes on de novo sequencing of fragments of the proteins present in the sample. Our algorithm takes as input a set of de novo amino acid strings derived from the given mass spectra using the recently proposed Twister approach, and combines them into aggregated strings endowed with offsets. The former typically constitute accurate sequence fragments of sufficiently well-represented proteins from the sample being analyzed, while the latter indicate their location in the protein sequence, and also bear information on post-translational modifications and fragmentation patterns. Availability and Implementation: Freely available on the web at http://bioinf.spbau.ru/en/twister.