On the complexity of All $\varepsilon$ -Best Arms Identification

13 February 2022

Abstract

We consider the problem introduced by \cite{Mason2020} of identifying all the $\varepsilon$ -optimal arms in a finite stochastic multi-armed bandit with Gaussian rewards. In the fixed confidence setting, we give a lower bound on the number of samples required by any algorithm that returns the set of $\varepsilon$ -good arms with a failure probability less than some risk level $\delta$ . This bound writes as $T_{\varepsilon}^*(\mu)\log(1/\delta)$ , where $T_{\varepsilon}^*(\mu)$ is a characteristic time that depends on the vector of mean rewards $\mu$ and the accuracy parameter $\varepsilon$ . We also provide an efficient numerical method to solve the convex max-min program that defines the characteristic time. Our method is based on a complete characterization of the alternative bandit instances that the optimal sampling strategy needs to rule out, thus making our bound tighter than the one provided by \cite{Mason2020}. Using this method, we propose a Track-and-Stop algorithm that identifies the set of $\varepsilon$ -good arms w.h.p and enjoys asymptotic optimality (when $\delta$ goes to zero) in terms of the expected sample complexity. Finally, using numerical simulations, we demonstrate our algorithm's advantage over state-of-the-art methods, even for moderate values of the risk parameter.

View on arXiv

Comments on this paper

On the complexity of All ε\varepsilonε-Best Arms Identification

On the complexity of All $\varepsilon$ -Best Arms Identification