475
v1v2 (latest)

kk-SVD with Gradient Descent

Main:15 Pages
1 Figures
Bibliography:4 Pages
4 Tables
Appendix:30 Pages
Abstract

The emergence of modern compute infrastructure for iterative optimization has led to great interest in developing optimization-based approaches for a scalable computation of kk-SVD, i.e., the k1k\geq 1 largest singular values and corresponding vectors of a matrix of rank d1d \geq 1. Despite lots of exciting recent works, all prior works fall short in this pursuit. Specifically, the existing results are either for the exact-parameterized (i.e., k=dk = d) and over-parameterized (i.e., k>dk > d) settings; or only establish local convergence guarantees; or use a step-size that requires problem-instance-specific oracle-provided information. In this work, we complete this pursuit by providing a gradient-descent method with a simple, universal rule for step-size selection (akin to pre-conditioning), that provably finds kk-SVD for a matrix of any rank d1d \geq 1. We establish that the gradient method with random initialization enjoys global linear convergence for any k,d1k, d \geq 1. Our convergence analysis reveals that the gradient method has an attractive region, and within this attractive region, the method behaves like Heron's method (a.k.a. the Babylonian method). Our analytic results about the said attractive region imply that the gradient method can be enhanced by means of Nesterov's momentum-based acceleration technique. The resulting improved convergence rates match those of rather complicated methods typically relying on Lanczos iterations or variants thereof. We provide an empirical study to validate the theoretical results.

View on arXiv
Comments on this paper