Near Optimal Adversarial Attack on UCB Bandits
- AAML

Abstract
I study a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. I propose a novel attack strategy that manipulates a learner employing the UCB algorithm into pulling some non-optimal target arm times with a cumulative cost that scales as , where is the number of rounds. I also prove the first lower bound on the cumulative attack cost. The lower bound matches the upper bound up to factors, showing the proposed attack strategy to be near optimal.
View on arXivComments on this paper