LT3 at SemEval-2020 Task 7 : comparing feature-based and transformer-based approaches to detect funny headlines

Publication type
C1
Publication status
Published
Authors
Vanroy, B., Labat, S., Kaminska, OK, Lefever, E., & Hoste, V.
Series
Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval 2020)
Pagination
1033-1040
Publisher
International Committee for Computational Linguistics
Conference
the Fourteenth Workshop on Semantic Evaluation (SemEval 2020) (Barcelona, Spain)
Download
(.pdf)
View in Biblio
(externe link)

Abstract

This paper presents two different systems for the SemEval shared task 7 on Assessing Humor in Edited News Headlines, sub-task 1, where the aim was to estimate the intensity of humor generated in edited headlines. Our first system is a feature-based machine learning system that combines different types of information (e.g. word embeddings, string similarity, part-of-speech tags, perplexity scores, named entity recognition) in a Nu Support Vector Regressor (NuSVR). The second system is a deep learning-based approach that uses the pre-trained language model RoBERTa to learn latent features in the news headlines that are useful to predict the funniness of each headline. The latter system was also our final submission to the competition and is ranked seventh among the 49 participating teams, with a root-mean-square error (RMSE) of 0.5253.