270
v1v2 (latest)

Computing Approximate p\ell_p Sensitivities

Abstract

Recent works in dimensionality reduction for regression tasks have introduced the notion of sensitivity, an estimate of the importance of a specific datapoint in a dataset, offering provable guarantees on the quality of the approximation after removing low-sensitivity datapoints via subsampling. However, fast algorithms for approximating p\ell_p sensitivities, which we show is equivalent to approximate p\ell_p regression, are known for only the 2\ell_2 setting, in which they are termed leverage scores. In this work, we provide efficient algorithms for approximating p\ell_p sensitivities and related summary statistics of a given matrix. In particular, for a given n×dn \times d matrix, we compute α\alpha-approximation to its 1\ell_1 sensitivities at the cost of O(n/α)O(n/\alpha) sensitivity computations. For estimating the total p\ell_p sensitivity (i.e. the sum of p\ell_p sensitivities), we provide an algorithm based on importance sampling of p\ell_p Lewis weights, which computes a constant factor approximation to the total sensitivity at the cost of roughly O(d)O(\sqrt{d}) sensitivity computations. Furthermore, we estimate the maximum 1\ell_1 sensitivity, up to a d\sqrt{d} factor, using O(d)O(d) sensitivity computations. We generalize all these results to p\ell_p norms for p>1p > 1. Lastly, we experimentally show that for a wide class of matrices in real-world datasets, the total sensitivity can be quickly approximated and is significantly smaller than the theoretical prediction, demonstrating that real-world datasets have low intrinsic effective dimensionality.

View on arXiv
Comments on this paper