Secure-UCB: Saving Stochastic Bandits from Poisoning Attacks via Limited
Data Verification
- AAML
This paper studies bandit algorithms under data poisoning attacks in a bounded reward setting. We consider a strong attacker model in which the attacker can observe both the selected actions and their corresponding rewards, and can contaminate the rewards with additive noise. We show that \emph{any} bandit algorithm with regret can be forced to suffer a regret with an expected amount of contamination . This amount of contamination is also necessary, as we prove that there exists an regret bandit algorithm, specifically the classical UCB, that requires amount of contamination to suffer regret . To combat such poising attacks, our second main contribution is to propose a novel algorithm, Secure-UCB, which uses limited \emph{verification} to access a limited number of uncontaminated rewards. We show that with expected number of verifications, Secure-UCB can restore the order optimal regret \emph{irrespective of the amount of contamination} used by the attacker. Finally, we prove that for any bandit algorithm, this number of verifications is necessary to recover the order-optimal regret. We can then conclude that Secure-UCB is order-optimal in terms of both the expected regret and the expected number of verifications, and can save stochastic bandits from any data poisoning attack.
View on arXiv