Protein Fold Recognition Using Adaboost Learning Strategy

Su, Yijing

Protein Fold Recognition Using Adaboost Learning Strategy

Files

Su_11-15.pdf (296.71 KB)

Authors

Su, Yijing

Language

American English

Degree

M.S.

Degree Year

2007-12

Department

School of Informatics

Grantor

Indiana University

Abstract

Protein structure prediction is one of the most important and difficult problems in computational molecular biology. Unlike sequence-only comparison, protein fold recognition based on machine learning algorithms attempts to detect similarities between protein structures which might not be accompanied with any significant sequence similarity. It takes advantage of the information from structural and physic properties beyond sequence information. In this thesis, we present a novel classifier on protein fold recognition, using AdaBoost algorithm that hybrids to k Nearest Neighbor classifier. The experiment framework consists of two tasks: (i) carry out cross validation within the training dataset, and (ii) test on unseen validation dataset, in which 90% of the proteins have less than 25% sequence identity in training samples. Our result yields 64.7% successful rate in classifying independent validation dataset into 27 types of protein folds. Our experiments on the task of protein folding recognition prove the merit of this approach, as it shows that AdaBoost strategy coupling with weak learning classifiers lead to improved and robust performance of 64.7% accuracy versus 61.2% accuracy in published literatures using identical sample sets, feature representation, and class labels.

Keywords

Adaboost, Recognition, Learning Strategy, Protein Fold

Rights

Type

Thesis

Permanent Link

https://hdl.handle.net/1805/2267
http://dx.doi.org/10.7912/C2/889

Collections

Informatics Graduate Theses and PhD Dissertations
Informatics School Theses and Dissertations

Full item page