Near-Optimal Mean Estimation with Unknown, Heteroskedastic Variances

Given data drawn from a collection of Gaussian variables with a common mean but different and unknown variances, what is the best algorithm for estimating their common mean? We present an intuitive and efficient algorithm for this task. As different closed-form guarantees can be hard to compare, the Subset-of-Signals model serves as a benchmark for heteroskedastic mean estimation: given Gaussian variables with an unknown subset of variables having variance bounded by 1, what is the optimal estimation error as a function of and ? Our algorithm resolves this open question up to logarithmic factors, improving upon the previous best known estimation error by polynomial factors when for all . Of particular note, we obtain error with variance-bounded samples, whereas previous work required . Finally, we show that in the multi-dimensional setting, even for , our techniques enable rates comparable to knowing the variance of each sample.
View on arXiv