Exploring LLMs’ capabilities for error detection in Dutch L1 and L2 writing products

Publication type
A2
Publication status
Published
Authors
Kruijsbergen, J., Van Geertruyen, S, Hoste, V., & De Clercq, O.
Journal
COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL
Volume
13
Pagination
173-191
Download
(.pdf)
View in Biblio
(externe link)

Abstract

This research examines the capabilities of Large Language Models for writing error detection, which can be seen as a first step towards automated writing support. Our work focuses on Dutch writing error detection, targeting two envisaged end-users: L1 and L2 adult speakers of Dutch. We relied on proprietary L1 and L2 datasets comprising writing products annotated with a variety of writing errors. Following the recent paradigms in NLP research, we experimented with both a fine-tuning approach combining different mono- (BERTje, RobBERT) and multilingual (mBERT, XLM-RoBERTa) models, as well as a zero-shot approach through prompting a generative autoregressive language model (GPT-3.5). The results reveal that the fine-tuning approach outperforms zero-shotting to a large extent, both for L1 and L2, even though there is much room left for improvement.