68
8

Why So Down? The Role of Negative (and Positive) Pointwise Mutual Information in Distributional Semantics

Abstract

In distributional semantics, the pointwise mutual information (PMI\mathit{PMI}) weighting of the cooccurrence matrix performs far better than raw counts. There is, however, an issue with unobserved pair cooccurrences as PMI\mathit{PMI} goes to negative infinity. This problem is aggravated by unreliable statistics from finite corpora which lead to a large number of such pairs. A common practice is to clip negative PMI\mathit{PMI} (-PMI\mathit{\texttt{-} PMI}) at 00, also known as Positive PMI\mathit{PMI} (PPMI\mathit{PPMI}). In this paper, we investigate alternative ways of dealing with -PMI\mathit{\texttt{-} PMI} and, more importantly, study the role that negative information plays in the performance of a low-rank, weighted factorization of different PMI\mathit{PMI} matrices. Using various semantic and syntactic tasks as probes into models which use either negative or positive PMI\mathit{PMI} (or both), we find that most of the encoded semantics and syntax come from positive PMI\mathit{PMI}, in contrast to -PMI\mathit{\texttt{-} PMI} which contributes almost exclusively syntactic information. Our findings deepen our understanding of distributional semantics, while also introducing novel PMIPMI variants and grounding the popular PPMIPPMI measure.

View on arXiv
Comments on this paper