Near Optimal Adversarial Attack on UCB Bandits
- AAML

Abstract
We consider a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. We propose a novel attack strategy that manipulates a UCB principle into pulling some non-optimal target arm times with a cumulative cost that scales as , where is the number of rounds. We also prove the first lower bound on the cumulative attack cost. Our lower bound matches our upper bound up to factors, showing our attack to be near optimal.
View on arXivComments on this paper