Improving fuzzy match augmented neural machine translation in specialised domains through synthetic data

Publication type
A2
Publication status
Published
Authors
Tezcan, A., Skidanova, A., & Moerman, T.M.
Journal
PRAGUE BULLETIN OF MATHEMATICAL LINGUISTICS
Issue
122
Pagination
9-42
Download
(.pdf)
View in Biblio
(externe link)

Abstract

Previous studies have demonstrated the effectiveness of fuzzy match (FM) augmentation in improving the performance of Neural Machine Translation (NMT) models. However, this approach exhibits limitations when applied to scenarios where limited parallel datasets are available for NMT training. This study investigates the effectiveness of leveraging additional monolingual data to improve FM-augmented NMT performance by generating synthetic parallel datasets in domain-specific scenarios. To this end, we adopt a simple strategy for combining two data augmentation methods for NMT, namely back-translation and Neural Fuzzy Repair (NFR). Experiments conducted on three language directions, namely English→Ukrainian, English→French and French→English, two domains and various dataset sizes show that this simple approach yields significant and substantial improvements in estimated translation quality.