110
4

Single Pass Entrywise-Transformed Low Rank Approximation

Abstract

In applications such as natural language processing or computer vision, one is given a large n×dn \times d matrix A=(ai,j)A = (a_{i,j}) and would like to compute a matrix decomposition, e.g., a low rank approximation, of a function f(A)=(f(ai,j))f(A) = (f(a_{i,j})) applied entrywise to AA. A very important special case is the likelihood function f(A)=log(aij+1)f\left( A \right ) = \log{\left( \left| a_{ij}\right| +1\right)}. A natural way to do this would be to simply apply ff to each entry of AA, and then compute the matrix decomposition, but this requires storing all of AA as well as multiple passes over its entries. Recent work of Liang et al.\ shows how to find a rank-kk factorization to f(A)f(A) for an n×nn \times n matrix AA using only npoly(ϵ1klogn)n \cdot \operatorname{poly}(\epsilon^{-1}k\log n) words of memory, with overall error 10f(A)[f(A)]kF2+poly(ϵ/k)f(A)1,2210\|f(A)-[f(A)]_k\|_F^2 + \operatorname{poly}(\epsilon/k) \|f(A)\|_{1,2}^2, where [f(A)]k[f(A)]_k is the best rank-kk approximation to f(A)f(A) and f(A)1,22\|f(A)\|_{1,2}^2 is the square of the sum of Euclidean lengths of rows of f(A)f(A). Their algorithm uses three passes over the entries of AA. The authors pose the open question of obtaining an algorithm with npoly(ϵ1klogn)n \cdot \operatorname{poly}(\epsilon^{-1}k\log n) words of memory using only a single pass over the entries of AA. In this paper we resolve this open question, obtaining the first single-pass algorithm for this problem and for the same class of functions ff studied by Liang et al. Moreover, our error is f(A)[f(A)]kF2+poly(ϵ/k)f(A)F2\|f(A)-[f(A)]_k\|_F^2 + \operatorname{poly}(\epsilon/k) \|f(A)\|_F^2, where f(A)F2\|f(A)\|_F^2 is the sum of squares of Euclidean lengths of rows of f(A)f(A). Thus our error is significantly smaller, as it removes the factor of 1010 and also f(A)F2f(A)1,22\|f(A)\|_F^2 \leq \|f(A)\|_{1,2}^2. We also give an algorithm for regression, pointing out an error in previous work, and empirically validate our results.

View on arXiv
Comments on this paper