On this page you can get access to the EventDNA corpus: a Dutch-language corpus comprising 1,773 news documents in which news events, entities, IPTC Media Topic codes and coreference links have been manually annotated following these guidelines.
WIth the corpus the results can be reproduced as reported in Colruyt, C., De Clercq, O., Desot, T. and Hoste, V. (forthcoming). EventDNA: a dataset for Dutch news event extraction as a basis for news diversification. To appear in Language Resources and Evaluation. The data for the IAA agreement study has also been made available.
The code for both the event extraction experiments and IAA study can be found on Github: https://github.com/NewsDNA-LT3/.github.
You can access both datasets by filling in your credentials at the top of this page. Please note that by downloading the data you agree to the following terms and conditions: