Complex Proteoform Identification Using Top-Down Mass Spectrometry

dc.contributor.advisorWu, Huanmei
dc.contributor.authorKou, Qiang
dc.contributor.otherLiu, Xiaowen
dc.contributor.otherLiu, Yunlong
dc.contributor.otherAl Hasan, Mohammad
dc.date.accessioned2019-01-07T20:45:41Z
dc.date.available2019-06-21T09:30:14Z
dc.date.issued2018-12
dc.degree.date2018en_US
dc.degree.discipline
dc.degree.grantorIndiana Universityen_US
dc.degree.levelPh.D.en_US
dc.descriptionIndiana University-Purdue University Indianapolis (IUPUI)en_US
dc.description.abstractProteoforms are distinct protein molecule forms created by variations in genes, gene expression, and other biological processes. Many proteoforms contain multiple primary structural alterations, including amino acid substitutions, terminal truncations, and posttranslational modifications. These primary structural alterations play a crucial role in determining protein functions: proteoforms from the same protein with different alterations may exhibit different functional behaviors. Because top-down mass spectrometry directly analyzes intact proteoforms and provides complete sequence information of proteoforms, it has become the method of choice for the identification of complex proteoforms. Although instruments and experimental protocols for top-down mass spectrometry have been advancing rapidly in the past several years, many computational problems in this area remain unsolved, and the development of software tools for analyzing such data is still at its very early stage. In this dissertation, we propose several novel algorithms for challenging computational problems in proteoform identification by top-down mass spectrometry. First, we present two approximate spectrum-based protein sequence filtering algorithms that quickly find a small number of candidate proteins from a large proteome database for a query mass spectrum. Second, we describe mass graph-based alignment algorithms that efficiently identify proteoforms with variable post-translational modifications and/or terminal truncations. Third, we propose a Markov chain Monte Carlo method for estimating the statistical signi ficance of identified proteoform spectrum matches. They are the first efficient algorithms that take into account three types of alterations: variable post-translational modifications, unexpected alterations, and terminal truncations in proteoform identification. As a result, they are more sensitive and powerful than other existing methods that consider only one or two of the three types of alterations. All the proposed algorithms have been incorporated into TopMG, a complete software pipeline for complex proteoform identification. Experimental results showed that TopMG significantly increases the number of identifications than other existing methods in proteome-level top-down mass spectrometry studies. TopMG will facilitate the applications of top-down mass spectrometry in many areas, such as the identification and quantification of clinically relevant proteoforms and the discovery of new proteoform biomarkers.en_US
dc.description.embargo2019-06-21
dc.identifier.urihttps://hdl.handle.net/1805/18094
dc.identifier.urihttp://dx.doi.org/10.7912/C2/931
dc.language.isoen_USen_US
dc.subjectAlgorithmsen_US
dc.subjectAlignmenten_US
dc.subjectBioinformaticsen_US
dc.subjectDatabase searchingen_US
dc.subjectProteoformen_US
dc.subjectTop-Down Mass Spectrometryen_US
dc.titleComplex Proteoform Identification Using Top-Down Mass Spectrometryen_US
dc.typeDissertation
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Kou_iupui_0104D_10340.pdf
Size:
770.23 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: