Statistically Optimal Robust Mean and Covariance Estimation for Anisotropic Gaussians

Assume that is an -contaminated sample of independent Gaussian vectors in with mean and covariance . In the strong -contamination model we assume that the adversary replaced an fraction of vectors in the original Gaussian sample by any other vectors. We show that there is an estimator of the mean satisfying, with probability at least , a bound of the form \[ \|\widehat{\mu} - \mu\|_2 \le c\left(\sqrt{\frac{\operatorname{Tr}(\Sigma)}{N}} + \sqrt{\frac{\|\Sigma\|\log(1/\delta)}{N}} + \varepsilon\sqrt{\|\Sigma\|}\right), \] where is an absolute constant and denotes the operator norm of . In the same contaminated Gaussian setup, we construct an estimator of the covariance matrix that satisfies, with probability at least , \[ \left\|\widehat{\Sigma} - \Sigma\right\| \le c\left(\sqrt{\frac{\|\Sigma\|\operatorname{Tr}(\Sigma)}{N}} + \|\Sigma\|\sqrt{\frac{\log(1/\delta)}{N}} + \varepsilon\|\Sigma\|\right). \] Both results are optimal up to multiplicative constant factors. Despite the recent significant interest in robust statistics, achieving both dimension-free bounds in the canonical Gaussian case remained open. In fact, several previously known results were either dimension-dependent and required to be close to identity, or had a sub-optimal dependence on the contamination level . As a part of the analysis, we derive sharp concentration inequalities for central order statistics of Gaussian, folded normal, and chi-squared distributions.
View on arXiv