Classifying the Unknown: Identification of Insects by Deep Open-set Bayesian Learning

If you need an accessible version of this item, please submit a remediation request.
Date
2021-09-17
Language
American English
Embargo Lift Date
Department
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
bioRxiv
Abstract

Insects represent a large majority of biodiversity on Earth, yet only 20% of the estimated 5.5 million insect species are currently described (1). While describing new species typically requires specific taxonomic expertise to identify morphological characters that distinguish it from other potential species, DNA-based methods have aided in providing additional evidence of separate species (2). Machine learning (ML) is emerging as a potential new approach in identifying new species, given that this analysis may be more sensitive to subtle differences humans may not process. Existing ML algorithms are limited by image repositories that do not include undescribed species. We developed a Bayesian deep learning method for the open-set classification of species. The proposed approach forms a Bayesian hierarchy of species around corresponding genera and uses deep embeddings of images and barcodes together to identify insects at the lowest level of abstraction possible. To demonstrate proof of concept, we used a database of 32,848 insect instances from 1,040 described species split into training and test data. The test data included 243 species not present in the training data. Our results demonstrate that using DNA sequences and images together, insect instances of described species can be classified with 96.66% accuracy while achieving accuracy of 81.39% in identifying genera of insect instances of undescribed species. The proposed deep open-set Bayesian model demonstrates a powerful new approach that can be used for the gargantuan task of identifying new insect species.

Description
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
Badirli, S., Picard, C. J., Mohler, G., Akata, Z., & Dundar, M. (2021). Classifying the Unknown: Identification of Insects by Deep Open-set Bayesian Learning (p. 2021.09.15.460492). bioRxiv. https://doi.org/10.1101/2021.09.15.460492
ISSN
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
Source
Alternative Title
Type
Preprint
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Full Text Available at
This item is under embargo {{howLong}}
Collections