Towards Fully Adaptive Regret Minimization in Heavy-Tailed Bandits

4 October 2023

Gianmarco Genalti

Lupo Marsigli

Nicola Gatti

Alberto Maria Metelli

ArXiv (abs)PDF HTML Github

Main:11 Pages

Bibliography:5 Pages

1 Tables

Appendix:18 Pages

Abstract

Heavy-tailed distributions naturally arise in many settings, from finance to telecommunications. While regret minimization under sub-Gaussian or bounded support rewards has been widely studied, learning on heavy-tailed distributions only gained popularity over the last decade. In the stochastic heavy-tailed bandit problem, an agent learns under the assumption that the distributions have finite moments of maximum order $1+\epsilon$ which are uniformly bounded by a constant $u$ , for some $\epsilon \in (0,1]$ . To the best of our knowledge, literature only provides algorithms requiring these two quantities as an input. In this paper, we study the stochastic adaptive heavy-tailed bandit, a variation of the standard setting where both $\epsilon$ and $u$ are unknown to the agent. We show that adaptivity comes at a cost, introducing two lower bounds on the regret of any adaptive algorithm, implying a higher regret w.r.t. the standard setting. Finally, we introduce a specific distributional assumption and provide Adaptive Robust UCB, a regret minimization strategy matching the known lower bound for the heavy-tailed MAB problem.

View on arXiv

Comments on this paper