Deep multiple instance learning for foreground speech localization in ambient audio from wearable devices

Hebbar, Rajat; Papadopoulos, Pavlos; Reyes, Ramon; Danvers, Alexander F.; Polsinelli, Angelina J.; Moseley, Suzanne A.; Sbarra, David A.; Mehl, Matthias R.; Narayanan, Shrikanth

Deep multiple instance learning for foreground speech localization in ambient audio from wearable devices

dc.contributor.author	Hebbar, Rajat
dc.contributor.author	Papadopoulos, Pavlos
dc.contributor.author	Reyes, Ramon
dc.contributor.author	Danvers, Alexander F.
dc.contributor.author	Polsinelli, Angelina J.
dc.contributor.author	Moseley, Suzanne A.
dc.contributor.author	Sbarra, David A.
dc.contributor.author	Mehl, Matthias R.
dc.contributor.author	Narayanan, Shrikanth
dc.contributor.department	Neurology, School of Medicine	en_US
dc.date.accessioned	2022-05-18T16:13:04Z
dc.date.available	2022-05-18T16:13:04Z
dc.date.issued	2021
dc.description.abstract	Over the recent years, machine learning techniques have been employed to produce state-of-the-art results in several audio related tasks. The success of these approaches has been largely due to access to large amounts of open-source datasets and enhancement of computational resources. However, a shortcoming of these methods is that they often fail to generalize well to tasks from real life scenarios, due to domain mismatch. One such task is foreground speech detection from wearable audio devices. Several interfering factors such as dynamically varying environmental conditions, including background speakers, TV, or radio audio, render foreground speech detection to be a challenging task. Moreover, obtaining precise moment-to-moment annotations of audio streams for analysis and model training is also time-consuming and costly. In this work, we use multiple instance learning (MIL) to facilitate development of such models using annotations available at a lower time-resolution (coarsely labeled). We show how MIL can be applied to localize foreground speech in coarsely labeled audio and show both bag-level and instance-level results. We also study different pooling methods and how they can be adapted to densely distributed events as observed in our application. Finally, we show improvements using speech activity detection embeddings as features for foreground detection.	en_US
dc.eprint.version	Final published version	en_US
dc.identifier.citation	Hebbar R, Papadopoulos P, Reyes R, et al. Deep multiple instance learning for foreground speech localization in ambient audio from wearable devices. EURASIP J Audio Speech Music Process. 2021;2021(1):7. doi:10.1186/s13636-020-00194-0	en_US
dc.identifier.uri	https://hdl.handle.net/1805/29057
dc.language.iso	en_US	en_US
dc.publisher	Springer	en_US
dc.relation.isversionof	10.1186/s13636-020-00194-0	en_US
dc.relation.journal	EURASIP Journal on Audio, Speech, and Music Processing	en_US
dc.rights	Attribution 4.0 International	*
dc.rights.uri	https://creativecommons.org/licenses/by/4.0	*
dc.source	PMC	en_US
dc.subject	Foreground speech detection	en_US
dc.subject	Multiple instance learning	en_US
dc.subject	Wearable audio	en_US
dc.subject	Weakly labeled audio	en_US
dc.title	Deep multiple instance learning for foreground speech localization in ambient audio from wearable devices	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 13636_2020_Article_194.pdf
Size:: 1.43 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.99 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Open Access Policy Articles
Department of Neurology Works