SemEval-2 Task#3: Cross-Lingual Word Sense Disambiguation Task

Start date
Sept. 1, 2009
End date
July 31, 2010

About Cross-Lingual Word Sense Disambiguation

We formulated a "Cross-Lingual Word Sense Disambiguation" task in the framework of SemEval-2 (Evaluation Exercises on Semantic Evaluation).

SemEval-2: Cross-lingual Word Sense Disambiguation

Organizers: Els Lefever and Veronique Hoste (,


There is a general feeling in the WSD community that WSD should not be considered as an isolated research task, but should be integrated in real NLP applications such as Machine translation or multilingual IR. Using translations from a corpus instead of human defined (e.g. WordNet) sense labels, makes it easier to integrate WSD in multilingual applications, and solves the granularity problem that might be task-dependent as well. Furthermore, this type of corpus-based approach is language-independent and can be a valid alternative for languages that lack sufficient sense-inventories and sense-tagged corpora.

Task Description

We propose an Unsupervised Word Sense Disambiguation task for English nouns by means of parallel corpora. The sense label is composed of translations in the different languages and the sense inventory is built up on the basis of the Europarl parallel corpus . All translations of a polysemous word are grouped into clusters/"senses" of that given word. The sense inventory for all target nouns in the development and test data is manually built up. The translations are grouped/clustered by consensus; in case the annotators do not manage to reach consensus, we apply soft-clustering for that particular translation (assign it to two or more different clusters).


English - Dutch, French, German, Italian, Spanish


1. Bilingual Evaluation (English - Language X) Example: [English] ... equivalent to giving fish to people living on the [bank] of the river ... [Dutch] ... dit komt erop neer dat dorpelingen aan de oever van de rivier vis wordt gegeven ... Sense Label = {oever/dijk} [Dutch] Sense Label = {rives/rivage/bord/bords} [French] Sense Label = {Ufer} [German] Sense Label = {riva} [Italian] Sense Label = {orilla} [Spanish]

2. Multi-lingual Evaluation (English - all target languages) Example: ... living on the [bank] of the river ... Sense Label = {oever/dijk, rives/rivage/bord/bords, Ufer, riva, orilla}


As the task is formulated as an unsupervised WSD task, there is no annotated training material. Participants can use the Europarl corpus that is freely available and that is also used for building up the sense inventory. Participants are free to use other training corpora, but additional senses/translations (which are not present in Europarl) will not be included in the sense inventory that is used for evaluation.
We manually annotated development and test data. The development/sample data gives a preview of how the gold standard test data will look like (ambiguous nouns get sense label) and consists of a sample set of 5 polysemous English nouns, each with 20 example instances. The test data consists of 20 ambiguous nouns that are selected from the lexical substitution task, in order to make it easier for teams to participate to both tasks. For each word, 50 test instances have been manually annotated. Native speakers first decided on the correct translation cluster(s) for each test sentence and then picked their 3 preferred translations from the predefined list of Europarl translations (see Evaluation).


We use an evaluation scheme that is inspired by the English lexical substitution task in SemEval 2007. The evaluation is done using precision and recall. We perform both a "best result" evaluation (the first translation returned by a system) and a more relaxed evaluation for the "top five" results (the first five translations returned by a system). In order to assign weights to the candidate translations in the answer cluster(s) for each test sentence, native speakers pick the three most appropriate translations from the predefined sense inventory, and these translations get higher weights (+3 if chosen by all three natives, +2 if chosen by two natives, +1 if chosen by one native). 


The trial and test data, as well as the evaluation script and the 6-lingual sentence-aligned Europarl sub-corpus can be downloaded here.