Efficient Multivariate Robust Mean Estimation Under Mean-Shift Contamination

20 February 2025

Abstract

We study the algorithmic problem of robust mean estimation of an identity covariance Gaussian in the presence of mean-shift contamination. In this contamination model, we are given a set of points in $\mathbb{R}^d$ generated i.i.d. via the following process. For a parameter $\alpha<1/2$ , the $i$ -th sample $x_i$ is obtained as follows: with probability $1-\alpha$ , $x_i$ is drawn from $\mathcal{N}(\mu, I)$ , where $\mu \in \mathbb{R}^d$ is the target mean; and with probability $\alpha$ , $x_i$ is drawn from $\mathcal{N}(z_i, I)$ , where $z_i$ is unknown and potentially arbitrary. Prior work characterized the information-theoretic limits of this task. Specifically, it was shown that, in contrast to Huber contamination, in the presence of mean-shift contamination consistent estimation is possible. On the other hand, all known robust estimators in the mean-shift model have running times exponential in the dimension. Here we give the first computationally efficient algorithm for high-dimensional robust mean estimation with mean-shift contamination that can tolerate a constant fraction of outliers. In particular, our algorithm has near-optimal sample complexity, runs in sample-polynomial time, and approximates the target mean to any desired accuracy. Conceptually, our result contributes to a growing body of work that studies inference with respect to natural noise models lying in between fully adversarial and random settings.

View on arXiv

@article{diakonikolas2025_2502.14772,
  title={ Efficient Multivariate Robust Mean Estimation Under Mean-Shift Contamination },
  author={ Ilias Diakonikolas and Giannis Iakovidis and Daniel M. Kane and Thanasis Pittas },
  journal={arXiv preprint arXiv:2502.14772},
  year={ 2025 }
}

Comments on this paper