117
2

Near Optimal Adversarial Attack on UCB Bandits

Abstract

We consider a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. We propose a novel attack strategy that manipulates a UCB principle into pulling some non-optimal target arm To(T)T - o(T) times with a cumulative cost that scales as logT\sqrt{\log T}, where TT is the number of rounds. We also prove the first lower bound on the cumulative attack cost. Our lower bound matches our upper bound up to loglogT\log \log T factors, showing our attack to be near optimal.

View on arXiv
Comments on this paper