122
v1v2v3 (latest)

Ridge Leverage Score Sampling for p\ell_p Subspace Approximation

Abstract

The p\ell_p subspace approximation problem is an NP-hard low rank approximation problem that generalizes the median hyperplane (p=1p = 1), principal component analysis (p=2p = 2), and center hyperplane problems (p=p = \infty). A popular approach to cope with the NP-hardness is to compute a strong coreset, which is a weighted subset of input points that simultaneously approximates the cost of every kk-dimensional subspace, typically to (1+ϵ)(1+\epsilon) relative error for a small constant ϵ\epsilon.We obtain an algorithm for constructing a strong coreset for p\ell_p subspace approximation of size O~(kϵ4/p)\tilde O(k\epsilon^{-4/p}) for p<2p<2 and O~(kp/2ϵp)\tilde O(k^{p/2}\epsilon^{-p}) for p>2p>2. This offers the following improvements over prior work:- We construct the first strong coresets with nearly optimal dependence on kk for all p2p\neq 2. In prior work, [SW18] constructed coresets of modified points with a similar dependence on kk, while [HV20] constructed true coresets with polynomially worse dependence on kk. - We recover or improve the best known ϵ\epsilon dependence for all pp. In particular, for p>2p > 2, the [SW18] coreset of modified points had a dependence of ϵp2/2\epsilon^{-p^2/2} and the [HV20] coreset had a dependence of ϵ3p\epsilon^{-3p}.Our algorithm is based on sampling by root ridge leverage scores, which admits fast algorithms, especially for sparse or structured matrices. Our analysis avoids the use of the representative subspace theorem [SW18], which is a critical component of all prior dimension-independent coresets for p\ell_p subspace approximation.Our techniques also lead to the first nearly optimal online strong coresets for p\ell_p subspace approximation with similar bounds as the offline setting, resolving a problem of [WY23]. All prior approaches lose poly(k)\mathrm{poly}(k) factors in this setting, even when allowed to modify the original points.

View on arXiv
Comments on this paper