Catoni-style confidence sequences for heavy-tailed mean estimation

A confidence sequence (CS) is a sequence of confidence intervals that is valid at arbitrary data-dependent stopping times. These are useful in applications like A/B testing, multi-armed bandits, off-policy evaluation, election auditing, etc. We present three approaches to constructing a confidence sequence for the population mean, under the minimal assumption that only an upper bound on the variance is known. While previous works rely on light-tail assumptions like boundedness or subGaussianity (under which all moments of a distribution exist), the confidence sequences in our work are able to handle data from a wide range of heavy-tailed distributions. The best among our three methods -- the Catoni-style confidence sequence -- performs remarkably well in practice, essentially matching the state-of-the-art methods for -subGaussian data, and provably attains the lower bound due to the law of the iterated logarithm. Our findings have important implications for sequential experimentation with unbounded observations, since the -bounded-variance assumption is more realistic and easier to verify than -subGaussianity (which implies the former). We also extend our methods to data with infinite variance, but having -th central moment ().
View on arXiv