Dutch Parallel Corpus: MT Corpus and translator's aid
- Publication type
- Publication status
- Macken, L., Trushkina, J., & Rura, L.
- B. Maegaard
- Proceedings of Machine Translation Summit XI
- European Association of Machine Translation (Copenhagen, Denmark)
This paper reports on the development of the Dutch Parallel Corpus: a high quality sentence-aligned parallel corpus of 10 million words for the language pairs Dutch-English and Dutch-French. The corpus is composed of different text types. All steps of processing the corpus including alignment and linguistic annotation undergo quality control on different levels. Four categories of potential users of the DPC can be distinguished: developers of HLT-applications, linguists conducting more fundamental research, human translators and language learners. This paper focuses on two types of intended users: MT developers and human translators. The paper describes different characteristics of the corpus relevant for such users, concentrating on corpus design, processing of the corpus data and the exploitation of the corpus.