117
2

Near Optimal Adversarial Attack on UCB Bandits

Abstract

I study a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. I propose a novel attack strategy that manipulates a learner employing the UCB algorithm into pulling some non-optimal target arm To(T)T - o(T) times with a cumulative cost that scales as O^(logT)\widehat{O}(\sqrt{\log T}), where TT is the number of rounds. I also prove the first lower bound on the cumulative attack cost. The lower bound matches the upper bound up to O(loglogT)O(\log \log T) factors, showing the proposed attack strategy to be near optimal.

View on arXiv
Comments on this paper