Investigating how well contextual features are captured by bi-directional recurrent neural network models

3 September 2017

Abstract

Learning algorithms for natural language processing (NLP) tasks traditionally rely on manually defined appropriate contextual features. On the other hand, neural network models learn these features automatically and have been successfully applied for several NLP tasks. Such models only consider vector representation of words and thus do not require efforts for manual feature engineering. This makes neural models a natural choice to be used across several domains. But this flexibility comes at the cost of interpretability. The motivation of this work is to enhance understanding of neural models towards their ability to capture contextual features. In particular, we analyze the performance of bi-directional recurrent neural models for sequence tagging task by defining several measures based on word erasure technique and investigate their ability to capture relevant features. We perform a comprehensive analysis of these measures on general as well as biomedical domain datasets. Our experiments focus on important contextual words as features, which can easily be extended to analyze various other feature types. Not only this, we also investigate positional effects of context words and show how the developed methods can be used for error analysis.

View on arXiv

Comments on this paper