391

Adversarial Bandits Robust to Switching Targets

Abstract

We study the adversarial bandit problem under SS number of switching best arms for unknown SS. For handling this problem, we adopt the master-base framework using the online mirror descent method (OMD). We first provide a master-base algorithm with basic OMD, achieving O~(S1/2K1/3T2/3)\tilde{O}(S^{1/2}K^{1/3}T^{2/3}). For improving the regret bound with respect to TT, we propose to use adaptive learning rates for OMD to control variance of loss estimators, and achieve O~(min{E[SKTρT(h)],SKT})\tilde{O}(\min\{\mathbb{E}[\sqrt{SKT\rho_T(h^\dagger)}],S\sqrt{KT}\}), where ρT(h)\rho_T(h^\dagger) is a variance term for loss estimators.

View on arXiv
Comments on this paper