Thresholding Bandit with Optimal Aggregate Regret

Neural Information Processing Systems (NeurIPS), 2019

27 May 2019

Abstract

We consider the thresholding bandit problem, whose goal is to find arms of mean rewards above a given threshold $\theta$ , with a fixed budget of $T$ trials. We introduce LSA, a new, simple and anytime algorithm that aims to minimize the aggregate regret (or the expected number of mis-classified arms). We prove that our algorithm is instance-wise asymptotically optimal. We also provide comprehensive empirical results to demonstrate the algorithm's superior performance over existing algorithms under a variety of different scenarios.

View on arXiv

Comments on this paper