Benchmarking Machine Translation with Cultural Awareness

Date
2024-11
Language
English
Embargo Lift Date
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
ACL
Can't use the file because of accessibility barriers? Contact us with the title of the item, permanent link, and specifics of your accommodation need.
Abstract

Translating culture-related content is vital for effective cross-cultural communication. However, many culture-specific items (CSIs) often lack literal translation across languages, making it challenging to collect high-quality, diverse parallel corpora with CSI annotations. This difficulty hinders the analysis of cultural awareness of machine translation (MT) systems, including traditional neural MT and the emerging MT paradigm using large language models (LLM). To address this gap, we introduce a novel parallel corpus, enriched with CSI annotations in 6 language pairs for investigating Cultural-Aware Machine Translation—CAMT. Furthermore, we design two evaluation metrics to assess CSI translations, focusing on their pragmatic translation quality. Our findings show the superior ability of LLMs over neural MTs in leveraging external cultural knowledge for translating CSIs, especially those lacking translations in the target culture.

Description
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
Yao, B., Jiang, M., Bobinac, T., Yang, D., & Hu, J. (2024). Benchmarking Machine Translation with Cultural Awareness. In Y. Al-Onaizan, M. Bansal, & Y.-N. Chen (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2024 (pp. 13078–13096). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.findings-emnlp.765
ISSN
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
Findings of the Association for Computational Linguistics: EMNLP 2024
Source
Publisher
Alternative Title
Type
Article
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Final published version
Full Text Available at
This item is under embargo {{howLong}}