246

Syntactic Substitutability as Unsupervised Dependency Syntax

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Siva Reddy
Abstract

Syntax is a latent hierarchical structure which underpins the robust and compositional nature of human language. In this work, we further explore the hypothesis that syntactic dependencies can be represented in the attention distributions of language models trained on text and propose a new method to induce these structures theory-agnostically. Instead of modeling syntactic relations as defined by annotation schemata, we model a more general property implicit in the definition of dependency relations, syntactic substitutability. This property captures the fact that the words at either end of a syntactic dependency can be substituted with words from the same syntactic category, defining a set of syntactically-invariant sentences whose representations are then used as the basis for parsing. We demonstrate that our method results in both qualitative and quantitative gains, for example achieving 78.3% recall on a long-distance subject-verb agreement task vs. 8.5% with a previous unsupervised parsing method.

View on arXiv
Comments on this paper