Thomas Demeester, Véronique Hoste and Sofie Labat
Request access

About EmoTwiCS

On this page, you can get access to the EmoTwiCS corpus. EmoTwiCS is a corpus of 9,489 Dutch customer service dialogues that were scraped from Twitter. In our business-oriented corpus, we view emotions as dynamic attributes of the customer that can change at each utterance of the conversation. EmoTwiCS is annotated for:

  • fine-grained emotions experienced by customers which are annotated with:
    • 28 emotion labels (multilabel setup),
    • 9 emotion clusters (multilabel setup),
    • valence scores on a 5-point scale,
    • arousal scores on a 5-point scale,
    • dominance scores on a 5-point scale.
  • the cause of the conversation, namely the event happening prior to the interaction that causes a customer to contact the company. If a conversation has a cause, it is annotated with one of 8 categories (multiclass setup).
  • response strategies used by the company operator which are annotated with 8 categories (multilabel setup).

The data collection, analysis, and machine learning baselines are described in the following papers. If you use this datasets, you agree to cite the associated papers:

Labat, S., Demeester, T., & Hoste, V. (2023). EmoTwiCS : a corpus for modelling emotion trajectories in Dutch customer service dialogues on Twitter. Language Resources and Evaluationhttps://doi.org/10.1007/s10579-023-09700-0

Labat, S., Hadifar, A., Demeester, T., & Hoste, V. (2022). An emotional journey : detecting emotion trajectories in Dutch customer service dialogues. Proceedings of the Eighth Workshop on Noisy User-Generated Text (W-NUT 2022), 106–112. Gyeongju, Republic of Korea: Association for Computational Linguistics (ACL). https://aclanthology.org/2022.wnut-1.12/