On this page, you can download the EmoWOZ corpus: a text-based conversational corpus in the domain of customer service, collected through the Wizard of Oz technique. In our Wizard of Oz experiment, participants believed to be interacting with different versions of an autonomous chatbot (called Chatty), while the system was in reality fully controlled by a human operator (henceforth: wizard). Each dialogue was grounded in an event description associated with a begin sentiment (neutral or negative), and wizards were instructed to navigate the conversation to a predefined end sentiment (positive, neutral or negative). Each participant had 12 conversations which were subsequently annotated for emotions with emotion categories and dimensional valence-arousal-dominance scores. In total, the corpus contains 2,148 text-based dialogues between 179 participants and wizards.
While the original version of the corpus was collected in Dutch (NL), a translated version to English (EN) is also made available. This translated version was obtained by first automatically translating the Dutch dialogues to English with DeepL. The resulting corpus was manually post-edited by three student workers who studied at the Department of Translation, Interpreting and Communication of Ghent University.
Finally, both the prestudy to this corpus and full experiment were approved by the Ethics Committee of the faculty of Arts and Philosophy at Ghent University. Upon terminating the final conversation, participants were debriefed and the Wizard of Oz setup was revealed to them. At this point, they received again the option to withdraw their data from the corpus.
The data collection and annotation procedure are described in the associated paper(s):
Labat, S., Ackaert, N., Demeester, T., & Hoste, V. (2022). Variation in the Expression and Annotation of Emotions: a Wizard of Oz Pilot Study. In G. Abercrombie, V. Basile, S. Tonelli, V. Rieser, & A. Uma (Eds.), Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022 (pp. 66–72). Marseille, France: European Language Resources Association (ELRA).
Please note that by downloading the data you agree to the following terms and conditions:
- The authors and their affiliated institutions make no warranties regarding the datasets provided. They cannot be held liable for providing access to the datasets or the usage of the datasets.
- The dataset should only be used for scientific or research purposes. Any other use is explicitly prohibited.
- The datasets must not be redistributed or shared in part or full with any third party. Redirect interested parties to this page.
- If you use any of the datasets, you agree to cite the associated paper(s) (when published).