Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit

18 May 2012

Abstract

We consider a linear stochastic bandit problem where the dimension $K$ of the unknown parameter $\theta$ is larger than the sampling budget $n$ . In such cases, it is in general impossible to derive sub-linear regret bounds since usual linear bandit algorithms have a regret in $O(K\sqrt{n})$ . In this paper we assume that $\theta$ is $S-$ sparse, i.e. has at most $S-$ non-zero components, and that the space of arms is the unit ball for the $||.||_2$ norm. We combine ideas from Compressed Sensing and Bandit Theory and derive algorithms with regret bounds in $O(S\sqrt{n})$ .

View on arXiv

Comments on this paper