v1v2 (latest)
Matrix Factorization using Window Sampling and Negative Sampling for
Improved Word Representations
Annual Meeting of the Association for Computational Linguistics (ACL), 2016
Abstract
In this paper, we propose LexVec, a new method for generating distributed word representations that uses low-rank, weighted factorization of the Positive Point-wise Mutual Information matrix via stochastic gradient descent, employing a weighting scheme that assigns heavier penalties for errors on frequent co-occurrences while still accounting for negative co-occurrence. Evaluation on word similarity and analogy tasks shows that LexVec matches and often outperforms state-of-the-art methods on many of these tasks.
View on arXivComments on this paper
