Spectral Deconvolution, Feature Detection, and Proteoform Identification for Top-Down Proteomics

Date
2024-12
Language
American English
Embargo Lift Date
Department
Committee Chair
Degree
Ph.D.
Degree Year
2024
Department
Luddy School of Informatics, Computing and Engineering
Grantor
Indiana University
Journal Title
Journal ISSN
Volume Title
Found At
Abstract

Liquid chromatography-based mass spectrometry (LC-MS) is widely used for proteoform identification, characterization, and quantitation. Bottom-up proteomics analyzes enzymatically digested peptides, while top-down proteomics examines intact proteoforms, enabling comprehensive identification of proteoforms with post-translational modifications (PTMs), genetic mutations, and alternative splicing. In MS data, due to the occurrence of different isotopes, proteins with the same chemical composition and charge state produce a group of peaks with different mass-to-charge ratios (m/z), called an isotopic envelope. A top-down mass spectrum often contains hundreds of high-charge state envelopes, some of which are overlapping. Consequently, analyzing top-down MS data presents computational challenges due to the complexity of top-down spectra. This dissertation introduces three new software tools EnvCNN, TopFD, and TopDIA for enhancing proteoform identification, characterization, and quantification in top-down MS data analysis. EnvCNN is a deep-learning model for evaluating isotopic envelopes of proteoforms and their fragments. This model aims to improve the accuracy of reporting fragments, thus increasing the number of identified proteoforms and improving the reliability of proteoform identification and characterization. TopFD is a software tool for proteoform feature detection, grouping all peaks of a proteoform in an LC-MS map into a single feature. TopFD outperforms other existing tools in the accuracy and reproducibility of feature detection, thereby improving proteoform identification and quantification. TopDIA is the first software tool for proteoform identification by top-down data-independent acquisition MS (TD-DIA-MS). Unlike conventional top-down data-dependent acquisition MS (TD-DDA-MS), which relies on intensity-based proteoform selection to generate fragment mass spectra, TD-DIA-MS fragments all proteoforms within predefined isolation windows, generating fragment mass spectra for every proteoform. TopDIA processes TD-DIA-MS data to generate demultiplexed pseudo spectra, which are searched against a protein database for proteoform identification, leading to a significant increase in the number of identified proteoforms compared with TD-DDA-MS. In summary, these new software tools help advance proteomics research by increasing the accuracy and comprehensiveness of proteoform analysis by top-down MS.

Description
IUI
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
ISSN
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
Source
Alternative Title
Type
Thesis
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Full Text Available at
This item is under embargo {{howLong}}