Continuum armed bandit problem of few variables in high dimensions

Workshop on Approximation and Online Algorithms (WAOA), 2013

21 April 2013

Abstract

We consider the stochastic and adversarial settings of continuum armed bandits where the arms are indexed by [0,1]^d. The reward functions r:[0,1]^d -> R are assumed to intrinsically depend on at most k coordinate variables implying r(x_1,..,x_d) = g(x_{i_1},..,x_{i_k}) for distinct and unknown i_1,..,i_k from {1,..,d} and some locally Holder continuous g:[0,1]^k -> R with exponent 0 < alpha <= 1. Firstly we consider the setting where (i_1,..,i_k) is fixed across time. We propose a simple modification of the CAB1 algorithm where we construct the discrete set of points to obtain a bound of O(n^((alpha+k)/(2*alpha+k)) (log n)^((alpha)/(2*alpha+k)) C(k,d)) on the regret, with C(k,d) depending at most polynomially in k and sub-logarithmically in d. The construction is based on creating partitions of {1,..,d} and is probabilistic, hence our result holds with high probability. Secondly we show that in case (i_1,..,i_k) is allowed to vary at each time step, then the low dimensional structure of the reward functions is useless in the sense that the worst-case regret incurred by any algorithm will be Omega(2^d).

View on arXiv

Comments on this paper