Classifying the Unknown: Identification of Insects by Deep Open-set Bayesian Learning

Insects represent a large majority of biodiversity on Earth, yet only 20% of the estimated 5.5 million insect species are currently described (1). While describing new species typically requires specific taxonomic expertise to identify morphological characters that distinguish it from other potential species, DNA-based methods have aided in providing additional evidence of separate species (2). Machine learning (ML) is emerging as a potential new approach in identifying new species, given that this analysis may be more sensitive to subtle differences humans may not process. Existing ML algorithms are limited by image repositories that do not include undescribed species. We developed a Bayesian deep learning method for the open-set classification of species. The proposed approach forms a Bayesian hierarchy of species around corresponding genera and uses deep embeddings of images and barcodes together to identify insects at the lowest level of abstraction possible. To demonstrate proof of concept, we used a database of 32,848 insect instances from 1,040 described species split into training and test data. The test data included 243 species not present in the training data. Our results demonstrate that using DNA sequences and images together, insect instances of described species can be classified with 96.66% accuracy while achieving accuracy of 81.39% in identifying genera of insect instances of undescribed species. The proposed deep open-set Bayesian model demonstrates a powerful new approach that can be used for the gargantuan task of identifying new insect species.

Keywords

Automated biodiversity measurement, Ecology, Insects, Image and DNA based identification, Machine learning

Cite As

Badirli, S., Picard, C. J., Mohler, G., Akata, Z., & Dundar, M. (2021). Classifying the Unknown: Identification of Insects by Deep Open-set Bayesian Learning (p. 2021.09.15.460492). bioRxiv. https://doi.org/10.1101/2021.09.15.460492

Rights

Attribution-NonCommercial 4.0 International

Type

Preprint

Permanent Link

https://hdl.handle.net/1805/34796

DOI

https://doi.org/10.1101/2021.09.15.460492

Collections

Christine Picard

Full item page