Recent works in dimensionality reduction for regression tasks have introduced the notion of sensitivity, an estimate of the importance of a specific datapoint in a dataset, offering provable guarantees on the quality of the approximation after removing low-sensitivity datapoints via subsampling. However, fast algorithms for approximating sensitivities, which we show is equivalent to approximate regression, are known for only the setting, in which they are termed leverage scores. In this work, we provide efficient algorithms for approximating sensitivities and related summary statistics of a given matrix. In particular, for a given matrix, we compute -approximation to its sensitivities at the cost of sensitivity computations. For estimating the total sensitivity (i.e. the sum of sensitivities), we provide an algorithm based on importance sampling of Lewis weights, which computes a constant factor approximation to the total sensitivity at the cost of roughly sensitivity computations. Furthermore, we estimate the maximum sensitivity, up to a factor, using sensitivity computations. We generalize all these results to norms for . Lastly, we experimentally show that for a wide class of matrices in real-world datasets, the total sensitivity can be quickly approximated and is significantly smaller than the theoretical prediction, demonstrating that real-world datasets have low intrinsic effective dimensionality.
View on arXiv