Annotating the Dutch Parallel Corpus
- Publication type
- Publication status
- Paulussen, H., & Macken, L.
- Lars Ahrenberg, Jörg Tiedemann, and Martin Volk
- Proceedings of the Workshop on Annotation and Exploitation of Parallel Corpora (AEPC)
- NEALT Proceedings Series, 10
- Northern European Association for Language Technology (NEALT)
The Dutch Parallel Corpus (DPC) is a translation corpus containing Dutch, English and French text samples aligned at sentence level. Next to sentence alignment, the corpus has also been grammatically annotated, thus improving exploitation for different domains, including natural language processing, translation research or CALL (computer-assisted language learning). In this paper, we describe the compilation of DPC and the alignment procedures used. This is followed by a description of the annotation task for the three languages, which required different tools and different tag sets. Finally the impact of different grammatical annotations on multilingual corpus exploitation is discussed.