13
2

Near-Optimal Mean Estimation with Unknown, Heteroskedastic Variances

Abstract

Given data drawn from a collection of Gaussian variables with a common mean but different and unknown variances, what is the best algorithm for estimating their common mean? We present an intuitive and efficient algorithm for this task. As different closed-form guarantees can be hard to compare, the Subset-of-Signals model serves as a benchmark for heteroskedastic mean estimation: given nn Gaussian variables with an unknown subset of mm variables having variance bounded by 1, what is the optimal estimation error as a function of nn and mm? Our algorithm resolves this open question up to logarithmic factors, improving upon the previous best known estimation error by polynomial factors when m=ncm = n^c for all 0<c<10<c<1. Of particular note, we obtain error o(1)o(1) with m=O~(n1/4)m = \tilde{O}(n^{1/4}) variance-bounded samples, whereas previous work required m=Ω~(n1/2)m = \tilde{\Omega}(n^{1/2}). Finally, we show that in the multi-dimensional setting, even for d=2d=2, our techniques enable rates comparable to knowing the variance of each sample.

View on arXiv
Comments on this paper