Flexible and Scalable Annotation Tool to Develop Scene Understanding Datasets

Recent progress in data-driven vision and language-based tasks demands developing training datasets enriched with multiple modalities representing human intelligence. The link between text and image data is one of the crucial modalities for developing AI models. The development process of such datasets in the video domain requires much effort from researchers and annotators (experts and non-experts). Researchers re-design annotation tools to extract knowledge from annotators to answer new research questions. The whole process repeats for each new question which is time consuming. However, since the last decade, there has been little change in how the researchers and annotators interact with the annotation process. We revisit the annotation workflow and propose a concept of an adaptable and scalable annotation tool. The concept emphasizes its users’ interactivity to make annotation process design seamless and efficient. Researchers can conveniently add newer modalities to or augment the extant datasets using the tool. The annotators can efficiently link free-form text to image objects. For conducting human-subject experiments on any scale, the tool supports the data collection for attaining group ground truth. We have conducted a case study using a prototype tool between two groups with the participation of 74 non-expert people. We find that the interactive linking of free-form text to image objects feels intuitive and evokes a thought process resulting in a high-quality annotation. The new design shows ≈ 35% improvement in the data annotation quality. On UX evaluation, we receive above-average positive feedback from 25 people regarding convenience, UI assistance, usability, and satisfaction.

Keywords

Vision and Language, Scene Understanding, Data Annotation

Cite As

Md Fazle Elahi, Renran Tian, and Xiao Luo. 2022. Flexible and Scalable Annotation Tool to Develop Scene Understanding Datasets. In Workshop on Human-In-the-Loop Data Analytics (HILDA ’22 ), June 12, 2022, Philadelphia, PA, USA. ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/3546930.3547499

Journal

HILDA '22: Proceedings of the Workshop on Human-In-the-Loop Data Analytics

Rights

Publisher Policy

Source

Author

Type

Article

Permanent Link

https://hdl.handle.net/1805/40310

DOI

https://doi.org/10.1145/3546930.3547499

Version

Final published version

Collections

Open Access Policy Articles
Department of Electrical and Computer Engineering Works

Full item page