63
v1v2v3 (latest)

Perturbation Analysis of Randomized SVD and its Applications to Statistics

Main:30 Pages
9 Figures
Bibliography:2 Pages
2 Tables
Appendix:48 Pages
Abstract

Randomized singular value decomposition (RSVD) is a class of computationally efficient algorithms for computing the truncated SVD of large data matrices. Given an m×nm \times n matrix M^\widehat{\mathbf M}, the prototypical RSVD algorithm outputs an approximation of the kk leading left singular vectors of M^\widehat{\mathbf{M}} by computing the SVD of M^(M^M^)gG\widehat{\mathbf{M}} (\widehat{\mathbf M}^{\top} \widehat{\mathbf{M}})^{g} \mathbf G; here g1g \geq 1 is an integer and GRn×k~\mathbf G \in \mathbb{R}^{n \times \widetilde{k}} is a random Gaussian sketching matrix with k~k\widetilde{k} \geq k. In this paper we derive upper bounds for the 2\ell_2 and 2,\ell_{2,\infty} distances between the exact left singular vectors U^\widehat{\mathbf{U}} of M^\widehat{\mathbf{M}} and its approximation U^g\widehat{\mathbf{U}}_g (obtained via RSVD), as well as entrywise error bounds when M^\widehat{\mathbf{M}} is projected onto U^gU^g\widehat{\mathbf{U}}_g \widehat{\mathbf{U}}_g^{\top}. These bounds depend on the singular values gap and number of power iterations gg, and smaller gap requires larger values of gg to guarantee the convergences of the 2\ell_2 and 2,\ell_{2,\infty} distances. We apply our theoretical results to settings where M^\widehat{\mathbf{M}} is an additive perturbation of some unobserved signal matrix M\mathbf{M}. In particular, we obtain the nearly-optimal convergence rate and asymptotic normality for RSVD on three inference problems, namely, subspace estimation and community detection in random graphs, noisy matrix completion, and PCA with missing data.

View on arXiv
Comments on this paper