Spectral Deconvolution, Feature Detection, and Proteoform Identification for Top-Down Proteomics

dc.contributor.advisorYan, Jingwen
dc.contributor.authorBasharat, Abdul Rehman
dc.contributor.otherLiu, Xiaowen
dc.contributor.otherZang, Yong
dc.contributor.otherWang, Juexin
dc.contributor.otherWan, Jun
dc.contributor.otherLuo, Xiao
dc.date.accessioned2025-01-09T09:22:31Z
dc.date.available2025-01-09T09:22:31Z
dc.date.issued2024-12
dc.degree.date2024
dc.degree.disciplineLuddy School of Informatics, Computing and Engineering
dc.degree.grantorIndiana University
dc.degree.levelPh.D.
dc.descriptionIUI
dc.description.abstractLiquid chromatography-based mass spectrometry (LC-MS) is widely used for proteoform identification, characterization, and quantitation. Bottom-up proteomics analyzes enzymatically digested peptides, while top-down proteomics examines intact proteoforms, enabling comprehensive identification of proteoforms with post-translational modifications (PTMs), genetic mutations, and alternative splicing. In MS data, due to the occurrence of different isotopes, proteins with the same chemical composition and charge state produce a group of peaks with different mass-to-charge ratios (m/z), called an isotopic envelope. A top-down mass spectrum often contains hundreds of high-charge state envelopes, some of which are overlapping. Consequently, analyzing top-down MS data presents computational challenges due to the complexity of top-down spectra. This dissertation introduces three new software tools EnvCNN, TopFD, and TopDIA for enhancing proteoform identification, characterization, and quantification in top-down MS data analysis. EnvCNN is a deep-learning model for evaluating isotopic envelopes of proteoforms and their fragments. This model aims to improve the accuracy of reporting fragments, thus increasing the number of identified proteoforms and improving the reliability of proteoform identification and characterization. TopFD is a software tool for proteoform feature detection, grouping all peaks of a proteoform in an LC-MS map into a single feature. TopFD outperforms other existing tools in the accuracy and reproducibility of feature detection, thereby improving proteoform identification and quantification. TopDIA is the first software tool for proteoform identification by top-down data-independent acquisition MS (TD-DIA-MS). Unlike conventional top-down data-dependent acquisition MS (TD-DDA-MS), which relies on intensity-based proteoform selection to generate fragment mass spectra, TD-DIA-MS fragments all proteoforms within predefined isolation windows, generating fragment mass spectra for every proteoform. TopDIA processes TD-DIA-MS data to generate demultiplexed pseudo spectra, which are searched against a protein database for proteoform identification, leading to a significant increase in the number of identified proteoforms compared with TD-DDA-MS. In summary, these new software tools help advance proteomics research by increasing the accuracy and comprehensiveness of proteoform analysis by top-down MS.
dc.identifier.urihttps://hdl.handle.net/1805/45215
dc.language.isoen_US
dc.subjectFeature Detection
dc.subjectIsotopic Deconvolution
dc.subjectMass Spectrometry
dc.subjectProteoform Identification
dc.subjectSpectral Acquisition
dc.subjectTop-Down Proteomics
dc.titleSpectral Deconvolution, Feature Detection, and Proteoform Identification for Top-Down Proteomics
dc.typeThesis
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Basharat_iuindianapolis_2432A_10842.pdf
Size:
9.27 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.04 KB
Format:
Item-specific license agreed upon to submission
Description: