Robust subgaussian estimation of a mean vector in nearly linear time

We construct an algorithm, running in time , which is robust to outliers and heavy-tailed data and which achieves the subgaussian rate from [Lugosi, Mendelson] \begin{equation}\label{eq:intro_subgaus_rate} \sqrt{\frac{{\rm Tr}(\Sigma)}{N}}+\sqrt{\frac{||\Sigma||_{op}K}{N}} \end{equation}with probability at least where is the covariance matrix of the informative data, is some parameter (number of block means) and is another parameter of the algorithm. This rate is achieved when where is the number of outliers in the database and under the only assumption that the informative data have a second moment. The algorithm is fully data-dependent and does not use in its construction the proportion of outliers nor the rate above. Its construction combines recently developed tools for Median-of-Means estimators and covering-Semi-definite Programming [Chen, Diakonikolas, Ge] and [Peng, Tangwongsan, Zhang].
View on arXiv