11
8

An Asymptotically Optimal Strategy for Constrained Multi-armed Bandit Problems

Abstract

For the stochastic multi-armed bandit (MAB) problem from a constrained model that generalizes the classical one, we show that an asymptotic optimality is achievable by a simple strategy extended from the ϵt\epsilon_t-greedy strategy. We provide a finite-time lower bound on the probability of correct selection of an optimal near-feasible arm that holds for all time steps. Under some conditions, the bound approaches one as time tt goes to infinity. A particular example sequence of {ϵt}\{\epsilon_t\} having the asymptotic convergence rate in the order of (11t)4(1-\frac{1}{t})^4 that holds from a sufficiently large tt is also discussed.

View on arXiv
Comments on this paper