Cross-Lingual Syntactic Transfer with Limited Resources

19 October 2016

Abstract

We describe a simple but effective method for cross-lingual syntactic transfer of dependency parsers, in the scenario where a large amount of translation data is not available.The method makes use of three steps: 1) a method for deriving cross-lingual word clusters, that can then be used in a multilingual parser; 2) a method for transferring lexical information from a target language to source language treebanks; 3) a method for integrating these steps with the density-driven annotation projection method of Rasooli and Collins(2015). Experiments show improvements over the state-of-the-art in several languages used in previous work (Rasooli and Collins, 2015;Zhang and Barzilay, 2015; Ammar et al.,2016), in a setting where the only source of translation data is the Bible, a considerably smaller corpus than the Europarl corpus used in previous work. Results using the Europarl corpus as a source of translation data show additional improvements over the results of Rasooli and Collins (2015). We conclude with results on 38 datasets (26 languages) from the Universal Dependencies corpora: 13 datasets(10 languages) have unlabeled attachment ac-curacies of 80% or higher; the average unlabeled accuracy on the 38 datasets is 74.8%.

View on arXiv

Comments on this paper