48

Tight (Lower) Bounds for the Fixed Budget Best Arm Identification Bandit Problem

Abstract

We consider the problem of \textit{best arm identification} with a \textit{fixed budget TT}, in the KK-armed stochastic bandit setting, with arms distribution defined on [0,1][0,1]. We prove that any bandit strategy, for at least one bandit problem characterized by a complexity HH, will misidentify the best arm with probability lower bounded by exp(Tlog(K)H),\exp\Big(-\frac{T}{\log(K)H}\Big), where HH is the sum for all sub-optimal arms of the inverse of the squared gaps. Our result disproves formally the general belief - coming from results in the fixed confidence setting - that there must exist an algorithm for this problem whose probability of error is upper bounded by exp(T/H)\exp(-T/H). This also proves that some existing strategies based on the Successive Rejection of the arms are optimal - closing therefore the current gap between upper and lower bounds for the fixed budget best arm identification problem.

View on arXiv
Comments on this paper