Many Languages, One Parser
Abstract
We train one model for dependency parsing and use it to parse competitively in several languages. The parsing model uses multilingual word clusters and multilingual word embeddings alongside learned and specified typological information, enabling generalization based on linguistic universals and typological similarities. Our model can also incorporate language-specific features (e.g., fine POS tags), enabling still letting the parser to learn language-specific behaviors. Our parser compares favorably to strong baselines in a range of data scenarios, including when the target language has a large treebank, a small treebank, or no treebank for training.
View on arXivComments on this paper
