Batch Discovery of Recurring Rare Classes toward Identifying Anomalous Samples

Date
2014
Language
American English
Embargo Lift Date
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
ACM
Abstract

We present a clustering algorithm for discovering rare yet significant recurring classes across a batch of samples in the presence of random effects. We model each sample data by an infinite mixture of Dirichlet-process Gaussian-mixture models (DPMs) with each DPM representing the noisy realization of its corresponding class distribution in a given sample. We introduce dependencies across multiple samples by placing a global Dirichlet process prior over individual DPMs. This hierarchical prior introduces a sharing mechanism across samples and allows for identifying local realizations of classes across samples. We use collapsed Gibbs sampler for inference to recover local DPMs and identify their class associations. We demonstrate the utility of the proposed algorithm, processing a flow cytometry data set containing two extremely rare cell populations, and report results that significantly outperform competing techniques. The source code of the proposed algorithm is available on the web via the link: http://cs.iupui.edu/~dundar/aspire.htm.

Description
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
Dundar M, Yerebakan HZ, Rajwa B. Batch Discovery of Recurring Rare Classes toward Identifying Anomalous Samples. KDD. 2014;2014:223-232. doi:10.1145/2623330.2623695
ISSN
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
KDD
Rights
Publisher Policy
Source
PMC
Alternative Title
Type
Article
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Author's manuscript
Full Text Available at
This item is under embargo {{howLong}}