Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers

15 March 2017

Abstract

We introduce a criterion, resilience, which allows properties of a dataset (such as its mean or best low rank approximation) to be robustly computed, even in the presence of a large fraction of arbitrary additional data. Resilience is a weaker condition than most other properties considered so far in the literature, and yet enables robust estimation in a broader variety of settings. We provide new information-theoretic results on robust distribution learning, robust estimation of stochastic block models, and robust mean estimation under bounded $k$ th moments. We also provide new algorithmic results on robust distribution learning, as well as robust mean estimation in $\ell_p$ -norms. Among our proof techniques is a method for pruning a high-dimensional distribution with bounded $1$ st moments to a stable "core" with bounded $2$ nd moments, which may be of independent interest.

View on arXiv

Comments on this paper