46
30

Catoni-style confidence sequences for heavy-tailed mean estimation

Abstract

A confidence sequence (CS) is a sequence of confidence intervals that is valid at arbitrary data-dependent stopping times. These are useful in applications like A/B testing, multi-armed bandits, off-policy evaluation, election auditing, etc. We present three approaches to constructing a confidence sequence for the population mean, under the minimal assumption that only an upper bound σ2\sigma^2 on the variance is known. While previous works rely on light-tail assumptions like boundedness or subGaussianity (under which all moments of a distribution exist), the confidence sequences in our work are able to handle data from a wide range of heavy-tailed distributions. The best among our three methods -- the Catoni-style confidence sequence -- performs remarkably well in practice, essentially matching the state-of-the-art methods for σ2\sigma^2-subGaussian data, and provably attains the loglogt/t\sqrt{\log \log t/t} lower bound due to the law of the iterated logarithm. Our findings have important implications for sequential experimentation with unbounded observations, since the σ2\sigma^2-bounded-variance assumption is more realistic and easier to verify than σ2\sigma^2-subGaussianity (which implies the former). We also extend our methods to data with infinite variance, but having pp-th central moment (1<p<21<p<2).

View on arXiv
Comments on this paper