13
15

Low-Rank Approximation with 1/ε1/31/ε^{1/3} Matrix-Vector Products

Abstract

We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-pp norm. Here, given access to a matrix AA through matrix-vector products, an accuracy parameter ϵ\epsilon, and a target rank kk, the goal is to find a rank-kk matrix ZZ with orthonormal columns such that A(IZZ)Sp(1+ϵ)minUU=IkA(IUU)Sp\| A(I -ZZ^\top)\|_{S_p} \leq (1+\epsilon)\min_{U^\top U = I_k} \|A(I - U U^\top)\|_{S_p}, where MSp\|M\|_{S_p} denotes the p\ell_p norm of the the singular values of MM. For the special cases of p=2p=2 (Frobenius norm) and p=p = \infty (Spectral norm), Musco and Musco (NeurIPS 2015) obtained an algorithm based on Krylov methods that uses O~(k/ϵ)\tilde{O}(k/\sqrt{\epsilon}) matrix-vector products, improving on the na\"ive O~(k/ϵ)\tilde{O}(k/\epsilon) dependence obtainable by the power method, where O~\tilde{O} suppresses poly(log(dk/ϵ))(\log(dk/\epsilon)) factors. Our main result is an algorithm that uses only O~(kp1/6/ϵ1/3)\tilde{O}(kp^{1/6}/\epsilon^{1/3}) matrix-vector products, and works for all p1p \geq 1. For p=2p = 2 our bound improves the previous O~(k/ϵ1/2)\tilde{O}(k/\epsilon^{1/2}) bound to O~(k/ϵ1/3)\tilde{O}(k/\epsilon^{1/3}). Since the Schatten-pp and Schatten-\infty norms are the same up to a (1+ϵ)(1+ \epsilon)-factor when p(logd)/ϵp \geq (\log d)/\epsilon, our bound recovers the result of Musco and Musco for p=p = \infty. Further, we prove a matrix-vector query lower bound of Ω(1/ϵ1/3)\Omega(1/\epsilon^{1/3}) for any fixed constant p1p \geq 1, showing that surprisingly Θ~(1/ϵ1/3)\tilde{\Theta}(1/\epsilon^{1/3}) is the optimal complexity for constant~kk. To obtain our results, we introduce several new techniques, including optimizing over multiple Krylov subspaces simultaneously, and pinching inequalities for partitioned operators. Our lower bound for p[1,2]p \in [1,2] uses the Araki-Lieb-Thirring trace inequality, whereas for p>2p>2, we appeal to a norm-compression inequality for aligned partitioned operators.

View on arXiv
Comments on this paper