6
21

Robust Sparse Mean Estimation via Sum of Squares

Abstract

We study the problem of high-dimensional sparse mean estimation in the presence of an ϵ\epsilon-fraction of adversarial outliers. Prior work obtained sample and computationally efficient algorithms for this task for identity-covariance subgaussian distributions. In this work, we develop the first efficient algorithms for robust sparse mean estimation without a priori knowledge of the covariance. For distributions on Rd\mathbb R^d with "certifiably bounded" tt-th moments and sufficiently light tails, our algorithm achieves error of O(ϵ11/t)O(\epsilon^{1-1/t}) with sample complexity m=(klog(d))O(t)/ϵ22/tm = (k\log(d))^{O(t)}/\epsilon^{2-2/t}. For the special case of the Gaussian distribution, our algorithm achieves near-optimal error of O~(ϵ)\tilde O(\epsilon) with sample complexity m=O(k4polylog(d))/ϵ2m = O(k^4 \mathrm{polylog}(d))/\epsilon^2. Our algorithms follow the Sum-of-Squares based, proofs to algorithms approach. We complement our upper bounds with Statistical Query and low-degree polynomial testing lower bounds, providing evidence that the sample-time-error tradeoffs achieved by our algorithms are qualitatively the best possible.

View on arXiv
Comments on this paper