17
2

Fast UCB-type algorithms for stochastic bandits with heavy and super heavy symmetric noise

Abstract

In this study, we propose a new method for constructing UCB-type algorithms for stochastic multi-armed bandits based on general convex optimization methods with an inexact oracle. We derive the regret bounds corresponding to the convergence rates of the optimization methods. We propose a new algorithm Clipped-SGD-UCB and show, both theoretically and empirically, that in the case of symmetric noise in the reward, we can achieve an O(logTKTlogT)O(\log T\sqrt{KT\log T}) regret bound instead of O(T11+αKα1+α)O\left (T^{\frac{1}{1+\alpha}} K^{\frac{\alpha}{1+\alpha}} \right) for the case when the reward distribution satisfies EXD[X1+α]σ1+α\mathbb{E}_{X \in D}[|X|^{1+\alpha}] \leq \sigma^{1+\alpha} (α(0,1])\alpha \in (0, 1]), i.e. perform better than it is assumed by the general lower bound for bandits with heavy-tails. Moreover, the same bound holds even when the reward distribution does not have the expectation, that is, when α<0\alpha<0.

View on arXiv
Comments on this paper