classCleaner: A Quantitative Method for Validating Peptide Identification in LC-MS/MS Workflows

Key, Melissa Chester

classCleaner: A Quantitative Method for Validating Peptide Identification in LC-MS/MS Workflows

Files

Key_iupui_0104D_10440.pdf (5.46 MB)

Date

2020-05

Authors

Key, Melissa Chester

Language

American English

Committee Chair

Boukai, Benzion

Committee Members

Ragg, Susanne
Katz, Barry
Mosley, Amber

Degree

Ph.D.

Degree Year

2020

Grantor

Indiana University

Abstract

Because label-free liquid chromatography-tandem mass spectrometry (LC-MS/MS) shotgun proteomics infers the peptide sequence of each measurement, there is inherent uncertainty in the identity of each peptide and its originating protein. Removing misidentified peptides can improve the accuracy and power of downstream analyses when differences between proteins are of primary interest. In this dissertation I present classCleaner, a novel algorithm designed to identify misidentified peptides from each protein using the available quantitative data. The algorithm is based on the idea that distances between peptides belonging to the same protein are stochastically smaller than those between peptides in different proteins. The method first determines a threshold based on the estimated distribution of these two groups of distances. This is used to create a decision rule for each peptide based on counting the number of within-protein distances smaller than the threshold. Using simulated data, I show that classCleaner always reduces the proportion of misidentified peptides, with better results for larger proteins (by number of constituent peptides), smaller inherent misidentification rates, and larger sample sizes. ClassCleaner is also applied to a LC-MS/MS proteomics data set and the Congressional Voting Records data set from the UCI machine learning repository. The later is used to demonstrate that the algorithm is not specific to proteomics.

Description

Indiana University-Purdue University Indianapolis (IUPUI)

Keywords

class labels, classification, filtering, outliers, proteomics

Rights

Type

Thesis

Permanent Link

https://hdl.handle.net/1805/22837
http://dx.doi.org/10.7912/C2/2814

Collections

Biostatistics Department Theses and Dissertations

Full item page