v1v2 (latest)

A minimax and asymptotically optimal algorithm for stochastic bandits

International Conference on Algorithmic Learning Theory (ALT), 2017

23 February 2017

Pierre Ménard

Abstract

We propose the kl-UCB ++ algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the sense of Lai and Robbins' lower bound) and minimax optimal. This is the first algorithm proved to enjoy these two properties at the same time. This work thus merges two different lines of research with simple and clear proofs.

View on arXiv

Comments on this paper