PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits

24 January 2019

Abstract

We consider the problem of identifying any $k$ out of the best $m$ arms in an $n$ -armed stochastic multi-armed bandit. Framed in the PAC setting, this particular problem generalises both the problem of `best subset selection' and that of selecting `one out of the best m' arms [arcsk 2017]. In applications such as crowd-sourcing and drug-designing, identifying a single good solution is often not sufficient. Moreover, finding the best subset might be hard due to the presence of many indistinguishably close solutions. Our generalisation of identifying exactly $k$ arms out of the best $m$ , where $1 \leq k \leq m$ , serves as a more effective alternative. We present a lower bound on the worst-case sample complexity for general $k$ , and a fully sequential PAC algorithm, \GLUCB, which is more sample-efficient on easy instances. Also, extending our analysis to infinite-armed bandits, we present a PAC algorithm that is independent of $n$ , which identifies an arm from the best $\rho$ fraction of arms using at most an additive poly-log number of samples than compared to the lower bound, thereby improving over [arcsk 2017] and [Aziz+AKA:2018]. The problem of identifying $k > 1$ distinct arms from the best $\rho$ fraction is not always well-defined; for a special class of this problem, we present lower and upper bounds. Finally, through a reduction, we establish a relation between upper bounds for the `one out of the best $\rho$ ' problem for infinite instances and the `one out of the best $m$ ' problem for finite instances. We conjecture that it is more efficient to solve `small' finite instances using the latter formulation, rather than going through the former.

View on arXiv

Comments on this paper