298

Towards Fully Adaptive Regret Minimization in Heavy-Tailed Bandits

Main:11 Pages
Bibliography:5 Pages
1 Tables
Appendix:18 Pages
Abstract

Heavy-tailed distributions naturally arise in many settings, from finance to telecommunications. While regret minimization under sub-Gaussian or bounded support rewards has been widely studied, learning on heavy-tailed distributions only gained popularity over the last decade. In the stochastic heavy-tailed bandit problem, an agent learns under the assumption that the distributions have finite moments of maximum order 1+ϵ1+\epsilon which are uniformly bounded by a constant uu, for some ϵ(0,1]\epsilon \in (0,1]. To the best of our knowledge, literature only provides algorithms requiring these two quantities as an input. In this paper, we study the stochastic adaptive heavy-tailed bandit, a variation of the standard setting where both ϵ\epsilon and uu are unknown to the agent. We show that adaptivity comes at a cost, introducing two lower bounds on the regret of any adaptive algorithm, implying a higher regret w.r.t. the standard setting. Finally, we introduce a specific distributional assumption and provide Adaptive Robust UCB, a regret minimization strategy matching the known lower bound for the heavy-tailed MAB problem.

View on arXiv
Comments on this paper