Robust Sparse Mean Estimation via Incremental Learning
In this paper, we study the problem of robust sparse mean estimation, where the goal is to estimate a -sparse mean from a collection of partially corrupted samples drawn from a heavy-tailed distribution. Existing estimators face two critical challenges in this setting. First, the existing estimators rely on the prior knowledge of the sparsity level . Second, the existing estimators fall short of practical use as they scale poorly with the ambient dimension. This paper presents a simple mean estimator that overcomes both challenges under moderate conditions: it works without the knowledge of and runs in near-linear time and memory (both with respect to the ambient dimension). Moreover, provided that the signal-to-noise ratio is large, we can further improve our result to match the information-theoretic lower bound. At the core of our method lies an incremental learning phenomenon: we introduce a simple nonconvex framework that can incrementally learn the top- nonzero elements of the mean while keeping the zero elements arbitrarily small. Finally, we conduct a series of simulations to corroborate our theoretical findings.
View on arXiv