Multilingual hybrid automatic term extraction : the use case of ebpracticenet

Publication type
Publication status
Rigouts Terryn, A., Hoste, V., Buysschaert, J., & Lefever, E.
Timothy Read, Salvador Montaner and Beatriz Sedano
Technological innovation for specialized linguistic domains : languages for digital lives and cultures, proceedings of TISLID’18
Editions universitaires européennes
3rd conference on Technological Innovation for Specialized Linguistic Domains (TISLID 18) : Languages for digital lives and cultures (Ghent, Belgium)
View in Biblio
(externe link)


Accurate terminology is essential for professional communication, but also complex and challenging to translate. To improve multilingual communication, tools have been developed that automatically detect terms and their equivalents in other languages from parallel corpora. By means of a use case with data from ebpracticenet, we illustrate how hybrid multilingual automatic term extraction from parallel corpora works and how it can be used in a practical application such as search engine optimisation. The original aim was to use this list to improve the recall of a search engine by allowing multilingual searches (automatically obtaining search results containing both the original search term and the translations of the search term). Two additional possible applications were found when considering the data. The first addition was searching for related forms, using the automatically generated lemmas to group different forms of the same word. Next, it was found that multiple translations for the same source term reveal clusters of strongly semantically related words (e.g. the Dutch word “gif” is translated as “venom”, “toxin” and “poison”), so these can be used to find relevant documents as well. The ebpracticenet use case clearly illustrates the practical use of automatic terminology extraction from parallel corpora and the benefits of real-world applications to provide inspiration for further research.