KIND: an Italian Multi-Domain Dataset for Named Entity Recognition
International Conference on Language Resources and Evaluation (LREC), 2021
Abstract
In this paper we present KIND, an Italian dataset for Named-Entity Recognition. It contains more than one million tokens with the annotation covering three classes: persons, locations, and organizations. Most of the dataset (around 600K tokens) contains manual gold annotations in three different domains: news, literature, and political discourses. Texts and annotations are downloadable for free from the Github repository.
View on arXivComments on this paper
