66

KIND: an Italian Multi-Domain Dataset for Named Entity Recognition

International Conference on Language Resources and Evaluation (LREC), 2021
Abstract

In this paper we present KIND, an Italian dataset for Named-Entity Recognition. It contains more than one million tokens with the annotation covering three classes: persons, locations, and organizations. Most of the dataset (around 600K tokens) contains manual gold annotations in three different domains: news, literature, and political discourses. Texts and annotations are downloadable for free from the Github repository.

View on arXiv
Comments on this paper