Towards Instance Optimal Bounds for Best Arm Identification
In the best arm identification (Best-1-Arm) problem, we are given stochastic bandit arms, each associated with a reward distribution with an unknown mean. We would like to identify the arm with the largest mean with probability , using as few samples as possible. Understanding the sample complexity of Best-1-Arm has attracted significant attention since the last decade. However, the optimal sample complexity is still unknown. Recently, Chen and Li made an interesting conjecture, called gap-entropy conjecture, concerning the instance optimal sample complexity of Best-1-Arm. Given a Best-1-Arm instance, let denote the th largest mean and denote the corresponding gap. denotes the complexity of the instance. The gap-entropy conjecture states that for any instance , is an instance lower bound, where is an entropy-like term completely determined by s, and there is a -correct algorithm for Best-1-Arm with sample complexity . If the conjecture is true, we would have a complete understanding of the instance-wise sample complexity of Best-1-Arm. We make significant progress towards the resolution of the gap-entropy conjecture. For the upper bound, we provide a highly nontrivial -correct algorithm which requires samples. For the lower bound, we show that for any Best-1-Arm instance with all gaps of the form , any -correct monotone algorithm requires at least samples in expectation.
View on arXiv