Multilingual term extraction from comparable corpora : informativeness of monolingual term extraction features

Publication type
C1
Publication status
Published
Authors
Steyaert, K., & Rigouts Terryn, A.
Editor
Serge Sharoff, Pierre Zweigenbaum and Reinhard Rapp
Series
Proceedings of the 12th Workshop on Building and Using Comparable Corpora at RANLP 2019, BUCC 2019
Pagination
9-18
Conference
12th Workshop on Building and Using Comparable Corpora at RANLP 2019 (Varna, Bulgaria)
Download
(.pdf)
View in Biblio
(externe link)

Abstract

Most research on bilingual automatic term extraction (ATE) from comparable corpora focuses on both components of the task separately, i.e. monolingual automatic term extraction and finding equivalent pairs cross-lingually. The latter usually relies on context vectors and is notoriously inaccurate for infrequent terms. The aim of this pilot study is to investigate whether using information gathered for the former might be beneficial for the cross-lingual linking as well, thereby illustrating the potential of a more holistic approach to ATE from comparable corpora with re-use of information across the components. To test this hypothesis, an existing dataset was expanded, which covers three languages and four domains. A supervised binary classifier is shown to achieve robust performance, with stable results across languages and domains.