We consider the fixed-budget best-arm identification problem with Normal reward distributions. In this problem, the forecaster is given arms (or treatments) and time steps. The forecaster attempts to find the best arm, defined by the largest mean, via an adaptive experiment conducted using an algorithm. The algorithm's performance is measured by the simple regret, that is, the quality of the estimated best arm. The frequentist simple regret can be exponentially small to , whereas the Bayesian simple regret is polynomially small to . This paper demonstrates that Bayes optimal algorithm, which minimizes the Bayesian simple regret, does not produce an exponential simple regret for some parameters, a finding that contrasts with the many results indicating the asymptotic equivalence of Bayesian and frequentist algorithms in the context of fixed sampling regimes. While the Bayes optimal algorithm is described in terms of a recursive equation that is virtually impossible to compute exactly, we establish the foundations for further analysis by introducing a key quantity that we call the expected Bellman improvement.
View on arXiv