384

Adversarial Bandits Robust to SS-Switch Regret

Abstract

We study the adversarial bandit problem under SS number of switching best arms for unknown SS. For handling this problem, we adopt the master-base framework using the online mirror descent method (OMD). We first provide a master-base algorithm with basic OMD, achieving O~(S1/2K1/3T2/3)\tilde{O}(S^{1/2}K^{1/3}T^{2/3}). For improving the regret bound with respect to TT, we propose to use adaptive learning rates for OMD to control variance of loss estimators, and achieve O~(min{E[SKTρT(h)],SKT})\tilde{O}(\min\{\mathbb{E}[\sqrt{SKT\rho_T(h^\dagger)}],S\sqrt{KT}\}), where ρT(h)\rho_T(h^\dagger) is a variance term for loss estimators.

View on arXiv
Comments on this paper