11
2

Online Robust Mean Estimation

Abstract

We study the problem of high-dimensional robust mean estimation in an online setting. Specifically, we consider a scenario where nn sensors are measuring some common, ongoing phenomenon. At each time step t=1,2,,Tt=1,2,\ldots,T, the ithi^{th} sensor reports its readings xt(i)x^{(i)}_t for that time step. The algorithm must then commit to its estimate μt\mu_t for the true mean value of the process at time tt. We assume that most of the sensors observe independent samples from some common distribution XX, but an ϵ\epsilon-fraction of them may instead behave maliciously. The algorithm wishes to compute a good approximation μ\mu to the true mean μ:=E[X]\mu^\ast := \mathbf{E}[X]. We note that if the algorithm is allowed to wait until time TT to report its estimate, this reduces to the well-studied problem of robust mean estimation. However, the requirement that our algorithm produces partial estimates as the data is coming in substantially complicates the situation. We prove two main results about online robust mean estimation in this model. First, if the uncorrupted samples satisfy the standard condition of (ϵ,δ)(\epsilon,\delta)-stability, we give an efficient online algorithm that outputs estimates μt\mu_t, t[T],t \in [T], such that with high probability it holds that μμ2=O(δlog(T))\|\mu-\mu^\ast\|_2 = O(\delta \log(T)), where μ=(μt)t[T]\mu = (\mu_t)_{t \in [T]}. We note that this error bound is nearly competitive with the best offline algorithms, which would achieve 2\ell_2-error of O(δ)O(\delta). Our second main result shows that with additional assumptions on the input (most notably that XX is a product distribution) there are inefficient algorithms whose error does not depend on TT at all.

View on arXiv
Comments on this paper