415

Learning the distribution with largest mean: two bandit frameworks

Abstract

Over the past few years, the multi-armed bandit model has become increasingly popular in the machine learning community, in part because of applications including online content optimization. This paper reviews two different sequential learning tasks that have been considered in the bandit literature ; they can be formulated as (sequentially) learning the distribution that has the highest mean among a set of distributions, with some constraints on the learning process. For both of them (regret minimization and best arm identification), we present (asymptotically) optimal algorithms, some of which are quite recent. We compare the behavior of the sampling rule of each algorithm as well as the complexity terms associated to each problem.

View on arXiv
Comments on this paper