Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms
Main:9 Pages
6 Figures
Bibliography:2 Pages
Appendix:20 Pages
Abstract
We consider a stochastic multi-armed bandit problem with i.i.d. rewards where the expected reward function is multimodal with at most m modes. We propose the first known computationally tractable algorithm for computing the solution to the Graves-Lai optimization problem, which in turn enables the implementation of asymptotically optimal algorithms for this bandit problem. The code for the proposed algorithms is publicly available atthis https URL
View on arXivComments on this paper
