150

Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms

Main:9 Pages
6 Figures
Bibliography:2 Pages
Appendix:20 Pages
Abstract

We consider a stochastic multi-armed bandit problem with i.i.d. rewards where the expected reward function is multimodal with at most m modes. We propose the first known computationally tractable algorithm for computing the solution to the Graves-Lai optimization problem, which in turn enables the implementation of asymptotically optimal algorithms for this bandit problem. The code for the proposed algorithms is publicly available atthis https URL

View on arXiv
Comments on this paper