Multiarmed Bandits With Limited Expert Advice
Annual Conference Computational Learning Theory (COLT), 2013
- LRM
Abstract
We solve the COLT 2013 open problem of Seldin et. al. on minimizing regret in the setting of advice-efficient multiarmed bandits with expert advice. We give an algorithm for the setting of K arms and N experts out of which we are allowed to query and use only M experts' advices in each round, which has a regret bound of 4\sqrt{\frac{\min\{K, M\} N \log(N)}{M} T} after T rounds. We also prove that any algorithm for this problem must have expected regret at least \Omega\bigP{\sqrt{\frac{\min\{K, \frac{M}{\log(K)}\} N}{M}T}}, thus showing that our upper bound is nearly tight.
View on arXivComments on this paper
