Accurate Part-Of-Speech Induction by Inter-Annotator Agreement

North American Chapter of the Association for Computational Linguistics (NAACL), 2018

20 April 2018

Abstract

We tackle part-of-speech induction with an information theoretic framework that models the dynamic of an inter-annotator agreement process. Our model consists of a pair of annotator networks whose objective is to maximize the mutual information between their annotations. Training the networks with standard stochastic optimization is challenging, but we find a simplification of the model that admits variational approximation to be empirically effective. We showcase the strength of our approach by achieving new state-of-the-art performance on a multitude of datasets and languages. Using a simple architecture that captures morphological and contextual information, our model clearly outperforms previous works that rely on carefully hand-crafted features.

View on arXiv

Comments on this paper