14
92

Robust subgaussian estimation of a mean vector in nearly linear time

Abstract

We construct an algorithm, running in time O~(Nd+uKd)\tilde{\mathcal O}(N d + uK d), which is robust to outliers and heavy-tailed data and which achieves the subgaussian rate from [Lugosi, Mendelson] \begin{equation}\label{eq:intro_subgaus_rate} \sqrt{\frac{{\rm Tr}(\Sigma)}{N}}+\sqrt{\frac{||\Sigma||_{op}K}{N}} \end{equation}with probability at least 1exp(c0K)exp(c1u)1-\exp(-c_0K)-\exp(-c_1 u) where Σ\Sigma is the covariance matrix of the informative data, K{1,,K}K\in\{1, \ldots, K\} is some parameter (number of block means) and u>0u>0 is another parameter of the algorithm. This rate is achieved when Kc1OK\geq c_1 |\mathcal O| where O|\mathcal O| is the number of outliers in the database and under the only assumption that the informative data have a second moment. The algorithm is fully data-dependent and does not use in its construction the proportion of outliers nor the rate above. Its construction combines recently developed tools for Median-of-Means estimators and covering-Semi-definite Programming [Chen, Diakonikolas, Ge] and [Peng, Tangwongsan, Zhang].

View on arXiv
Comments on this paper