- Browse by Subject
Browsing by Subject "Top-down mass spectrometry"
Now showing 1 - 7 of 7
Results Per Page
Sort Options
Item Characterization of Proteoform Post-Translational Modifications by Top-Down and Bottom-Up Mass Spectrometry in Conjunction with Annotations(American Chemical Society, 2023) Chen, Wenrong; Ding, Zhengming; Zang, Yong; Liu, Xiaowen; BioHealth Informatics, School of Informatics and ComputingMany proteoforms can be produced from a gene due to genetic mutations, alternative splicing, post-translational modifications (PTMs), and other variations. PTMs in proteoforms play critical roles in cell signaling, protein degradation, and other biological processes. Mass spectrometry (MS) is the primary technique for investigating PTMs in proteoforms, and two alternative MS approaches, top-down and bottom-up, have complementary strengths. The combination of the two approaches has the potential to increase the sensitivity and accuracy in PTM identification and characterization. In addition, protein and PTM knowledge bases, such as UniProt, provide valuable information for PTM characterization and verification. Here, we present a software pipeline PTM-TBA (PTM characterization by Top-down and Bottom-up MS and Annotations) for identifying and localizing PTMs in proteoforms by integrating top-down and bottom-up MS as well as PTM annotations. We assessed PTM-TBA using a technical triplicate of bottom-up and top-down MS data of SW480 cells. On average, database search of the top-down MS data identified 2000 mass shifts, 814.5 (40.7%) of which were matched to 11 common PTMs and 423 of which were localized. Of the mass shifts identified by top-down MS, PTM-TBA verified 435 mass shifts using the bottom-up MS data and UniProt annotations.Item Evaluation of Machine Learning Models for Proteoform Retention and Migration Time Prediction in Top-Down Mass Spectrometry(American Chemical Society, 2022) Chen, Wenrong; McCool, Elijah N.; Sun, Liangliang; Zang, Yong; Ning, Xia; Liu, Xiaowen; BioHealth Informatics, School of Informatics and ComputingReversed-phase liquid chromatography (RPLC) and capillary zone electrophoresis (CZE) are two primary proteoform separation methods in mass spectrometry (MS)-based top-down proteomics. Proteoform retention time (RT) prediction in RPLC and migration time (MT) prediction in CZE provide additional information for accurate proteoform identification and quantification. While existing methods are mainly focused on peptide RT and MT prediction in bottom-up MS, there is still a lack of methods for proteoform RT and MT prediction in top-down MS. We systematically evaluated eight machine learning models and a transfer learning method for proteoform RT prediction and five models and the transfer learning method for proteoform MT prediction. Experimental results showed that a gated recurrent unit (GRU)-based model with transfer learning achieved a high accuracy (R = 0.978) for proteoform RT prediction and that the GRU-based model and a fully connected neural network model obtained a high accuracy of R = 0.982 and 0.981 for proteoform MT prediction, respectively.Item A Markov chain Monte Carlo method for estimating the statistical significance of proteoform identifications by top-down mass spectrometry(ACS, 2019-03) Kou, Qiang; Wang, Zhe; Lubeckyj, Rachele A.; Wu, Si; Liu, Xiaowen; BioHealth Informatics, School of Informatics and ComputingTop-down mass spectrometry is capable of identifying whole proteoform sequences with multiple post-translational modifications because it generates tandem mass spectra directly from intact proteoforms. Many software tools, such as ProSightPC, MSPathFinder, and TopMG, have been proposed for identifying proteoforms with modifications. In these tools, various methods are employed to estimate the statistical significance of identifications. However, most existing methods are designed for proteoform identifications without modifications, and the challenge remains for accurately estimating the statistical significance of proteoform identifications with modifications. Here we propose TopMCMC, a method that combines a Markov chain random walk algorithm and a greedy algorithm for assigning statistical significance to matches between spectra and protein sequences with variable modifications. Experimental results showed that TopMCMC achieved high accuracy in estimating E-values and false discovery rates of identifications in top-down mass spectrometry. Coupled with TopMG, TopMCMC identified more spectra than the generating function method from an MCF-7 top-down mass spectrometry data set.Item Proteoform Identification by Combining RNA-Seq and Top-down Mass Spectrometry(American Chemical Society, 2021) Chen, Wenrong; Liu, Xiaowen; BioHealth Informatics, School of Informatics and ComputingIn proteogenomic studies, genomic and transcriptomic variants are incorporated into customized protein databases for the identification of proteoforms, especially proteoforms with sample-specific variants. Most proteogenomic research has been focused on combining genomic or transcriptomic data with bottom-up mass spectrometry data. In the last decade, top-down mass spectrometry has attracted increasing attention because of its capacity to identify various proteoforms with alterations. However, top-down proteogenomics, in which genomic or transcriptomic data are combined with top-down mass spectrometry data, has not been widely adopted, and there is still a lack of software tools for top-down proteogenomic data analysis. In this paper, we introduce TopPG, a proteogenomic tool for generating proteoform sequence databases with genetic alterations and alternative splicing events. Experiments on top-down proteogenomic data of DLD-1 colorectal cancer cells showed that TopPG coupled with database search confidently identified proteoforms with sample-specific alterations.Item SpectroGene: A Tool for Proteogenomic Annotations Using Top-Down Spectra(ACS Publications, 2016-01-04) Kolmogorov, Mikhail; Liu, Xiaowen; Pevzner, Pavel A.; BioHealth Informatics, School of Informatics and ComputingIn the past decade, proteogenomics has emerged as a valuable technique that contributes to the state-of-the-art in genome annotation; however, previous proteogenomic studies were limited to bottom-up mass spectrometry and did not take advantage of top-down approaches. We show that top-down proteogenomics allows one to address the problems that remained beyond the reach of traditional bottom-up proteogenomics. In particular, we show that top-down proteogenomics leads to the discovery of previously unannotated genes even in extensively studied bacterial genomes and present SpectroGene, a software tool for genome annotation using top-down tandem mass spectra. We further show that top-down proteogenomics searches (against the six-frame translation of a genome) identify nearly all proteoforms found in traditional top-down proteomics searches (against the annotated proteome). SpectroGene is freely available at http://github.com/fenderglass/SpectroGene .Item TopDIA: A Software Tool for Top-Down Data-Independent Acquisition Proteomics(bioRxiv, 2024-04-09) Basharat, Abdul Rehman; Xiong, Xingzhao; Xu, Tian; Zang, Yong; Sun, Liangliang; Liu, Xiaowen; Biomedical Engineering and Informatics, Luddy School of Informatics, Computing, and EngineeringTop-down mass spectrometry is widely used for proteoform identification, characterization, and quantification owing to its ability to analyze intact proteoforms. In the last decade, top-down proteomics has been dominated by top-down data-dependent acquisition mass spectrometry (TD-DDA-MS), and top-down data-independent acquisition mass spectrometry (TD-DIA-MS) has not been well studied. While TD-DIA-MS produces complex multiplexed tandem mass spectrometry (MS/MS) spectra, which are challenging to confidently identify, it selects more precursor ions for MS/MS analysis and has the potential to increase proteoform identifications compared with TD-DDA-MS. Here we present TopDIA, the first software tool for proteoform identification by TD-DIA-MS. It generates demultiplexed pseudo MS/MS spectra from TD-DIA-MS data and then searches the pseudo MS/MS spectra against a protein sequence database for proteoform identification. We compared the performance of TD-DDA-MS and TD-DIA-MS using Escherichia coli K-12 MG1655 cells and demonstrated that TD-DIA-MS with TopDIA increased proteoform and protein identifications compared with TD-DDA-MS.Item TopMSV: A Web-Based Tool for Top-Down Mass Spectrometry Data Visualization(American Chemical Society, 2021) Choi, In Kwon; Jiang, Tianze; Kankara, Sreekanth Reddy; Wu, Si; Liu, Xiaowen; BioHealth Informatics, School of Informatics and ComputingTop-down mass spectrometry (MS) investigates intact proteoforms for proteoform identification, characterization, and quantification. Data visualization plays an essential role in top-down MS data analysis because proteoform identification and characterization often involve manual data inspection to determine the molecular masses of highly charged ions and validate unexpected alterations in identified proteoforms. While many software tools have been developed for MS data visualization, there is still a lack of web-based visualization software designed for top-down MS. Here, we present TopMSV, a web-based tool for top-down MS data processing and visualization. TopMSV provides interactive views of top-down MS data using a web browser. It integrates software tools for spectral deconvolution and proteoform identification and uses analysis results of the tools to annotate top-down MS data.