Adversarial Bandits Robust to -Switch Regret
Abstract
We study the adversarial bandit problem under number of switching best arms for unknown . For handling this problem, we adopt the master-base framework using the online mirror descent method (OMD). We first provide a master-base algorithm with basic OMD, achieving . For improving the regret bound with respect to , we propose to use adaptive learning rates for OMD to control variance of loss estimators, and achieve , where is a variance term for loss estimators.
View on arXivComments on this paper
