Multilingual Relation Extraction using Compositional Universal Schema

19 November 2015

Pat Verga

Abstract

When building a knowledge base (KB) of entities and relations from multiple structured KBs and text, universal schema represents the union of all input schema, by jointly embedding all relation types from input KBs as well as textual patterns expressing relations. In previous work, textual patterns are parametrized as a single embedding, preventing generalization to unseen textual patterns. In this paper we employ an LSTM to compositionally capture the semantics of relational text. We dramatically demonstrate the flexibility of our approach by evaluating in a multilingual setting, in which the English training data entities overlap with the seed KB, but the Spanish text does not. Additional improvements are obtained by tying word embeddings across languages. In extensive experiments on the English and Spanish TAC KBP benchmark, our techniques provide substantial accuracy improvements. Furthermore we find that training with the additional non-overlapping Spanish also improves English relation extraction accuracy. Our approach is thus suited to broad-coverage automated knowledge base construction in low-resource domains and languages.

View on arXiv

Comments on this paper