265

Exploiting Numerical Sparsity for Efficient Learning : Faster Eigenvector Computation and Regression

Abstract

In this paper, we obtain improved running times for regression and top eigenvector computation for numerically sparse matrices. Given a data matrix ARn×dA \in \mathbb{R}^{n \times d} where every row aRda \in \mathbb{R}^d has a22L\|a\|_2^2 \leq L and numerical sparsity at most ss, i.e. a12/a22s\|a\|_1^2 / \|a\|_2^2 \leq s, we provide faster algorithms for these problems in many parameter settings. For top eigenvector computation, we obtain a running time of O~(nd+r(s+rs)/gap2)\tilde{O}(nd + r(s + \sqrt{r s}) / \mathrm{gap}^2) where gap>0\mathrm{gap} > 0 is the relative gap between the top two eigenvectors of AAA^\top A and rr is the stable rank of AA. This running time improves upon the previous best unaccelerated running time of O(nd+rd/gap2)O(nd + r d / \mathrm{gap}^2) as it is always the case that rdr \leq d and sds \leq d. For regression, we obtain a running time of O~(nd+(nL/μ)snL/μ)\tilde{O}(nd + (nL / \mu) \sqrt{s nL / \mu}) where μ>0\mu > 0 is the smallest eigenvalue of AAA^\top A. This running time improves upon the previous best unaccelerated running time of O~(nd+nLd/μ)\tilde{O}(nd + n L d / \mu). This result expands the regimes where regression can be solved in nearly linear time from when L/μ=O~(1)L/\mu = \tilde{O}(1) to when L/μ=O~(d2/3/(sn)1/3)L / \mu = \tilde{O}(d^{2/3} / (sn)^{1/3}). Furthermore, we obtain similar improvements even when row norms and numerical sparsities are non-uniform and we show how to achieve even faster running times by accelerating using approximate proximal point [Frostig et. al. 2015] / catalyst [Lin et. al. 2015]. Our running times depend only on the size of the input and natural numerical measures of the matrix, i.e. eigenvalues and p\ell_p norms, making progress on a key open problem regarding optimal running times for efficient large-scale learning.

View on arXiv
Comments on this paper