A fine-grained error analysis of NMT, PBMT and RBMT output for English-to-Dutch

Publication type
C1
Publication status
Published
Authors
Van Brussel, LVB, Tezcan, A., & Macken, L.
Editor
Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis and Takenobu Tokunaga
Series
Proceedings of the Eleventh International Conference on Language Resources and Evaluation
Pagination
3799-3804
Publisher
European Language Resources Association (ELRA) (Miyazaki, Japan)
Conference
Eleventh International Conference on Language Resources and Evaluation (Miyazaki, Japan)
Download
(.pdf)
Project
ArisToCAT
View in Biblio
(externe link)

Abstract

This paper presents a fine-grained error comparison of the English-to-Dutch translations of a commercial neural, phrase-based and rule-based machine translation (MT) system. For phrase-based and rule-based machine translation, we make use of the annotated SCATE corpus of MT errors, enriching it with the annotation of neural MT errors and updating the SCATE error taxonomy to fit the neural MT output as well. Neural, in general, outperforms phrase-based and rule-based systems especially for fluency, except for lexical issues. On the accuracy level, the improvements are less obvious. The target sentence does not always contain traces or clues of content being missing (omissions). This has repercussions for quality estimation or gisting operating only on the monolingual level. Mistranslations are part of another well-represented error category, comprising a high number of word-sense disambiguation errors and a variety of other mistranslation errors, making it more complex to annotate or post-edit.