Gradient Ascent for Active Exploration in Bandit Problems

20 May 2019

Pierre Ménard

Abstract

We present a new algorithm based on an gradient ascent for a general Active Exploration bandit problem in the fixed confidence setting. This problem encompasses several well studied problems such that the Best Arm Identification or Thresholding Bandits. It consists of a new sampling rule based on an online lazy mirror ascent. We prove that this algorithm is asymptotically optimal and, most importantly, computationally efficient.

View on arXiv

Comments on this paper