A nonparametric Bayesian perspective for machine learning in partially-observed settings

Akova, Ferit

A nonparametric Bayesian perspective for machine learning in partially-observed settings

Files

Ferit_Akova_PhD_dissertation.pdf (1.29 MB)

Date

2014-07-31

Authors

Akova, Ferit

Language

American English

Committee Chair

Dundar, Mehmet Murat

Committee Members

Qi, Yuan Alan

Degree

Ph.D.

Degree Year

2013

Grantor

Purdue University

Abstract

Robustness and generalizability of supervised learning algorithms depend on the quality of the labeled data set in representing the real-life problem. In many real-world domains, however, we may not have full knowledge of the underlying data-generating mechanism, which may even have an evolving nature introducing new classes continually. This constitutes a partially-observed setting, where it would be impractical to obtain a labeled data set exhaustively defined by a fixed set of classes. Traditional supervised learning algorithms, assuming an exhaustive training library, would misclassify a future sample of an unobserved class with probability one, leading to an ill-defined classification problem. Our goal is to address situations where such assumption is violated by a non-exhaustive training library, which is a very realistic yet an overlooked issue in supervised learning.

In this dissertation we pursue a new direction for supervised learning by defining self-adjusting models to relax the fixed model assumption imposed on classes and their distributions. We let the model adapt itself to the prospective data by dynamically adding new classes/components as data demand, which in turn gradually make the model more representative of the entire population. In this framework, we first employ suitably chosen nonparametric priors to model class distributions for observed as well as unobserved classes and then, utilize new inference methods to classify samples from observed classes and discover/model novel classes for those from unobserved classes.

This thesis presents the initiating steps of an ongoing effort to address one of the most overlooked bottlenecks in supervised learning and indicates the potential for taking new perspectives in some of the most heavily studied areas of machine learning: novelty detection, online class discovery and semi-supervised learning.

Description

Indiana University-Purdue University Indianapolis (IUPUI)

Keywords

Nonparametric Bayesian; Nonexhaustive; Supervised; Semi-supervised

LC Subjects

Bayesian statistical decision theory -- Research -- Analysis -- Evaluation, Statistical decision, Supervised learning (Machine learning) -- Research, Nonparametric statistics -- Research, Mathematical statistics, Stochastic processes, Boosting (Algorithms), Statistics -- Data processing, Machine learning, Mathematical statistics -- Data processing, Discourse analysis -- Statistical methods, Computational linguistics, Data mining, Computational intelligence

Rights

Type

Thesis

Permanent Link

https://hdl.handle.net/1805/4825
http://dx.doi.org/10.7912/C2/2316

Collections

Computer & Information Science Department Theses and Dissertations

Full item page