65
0

A Near-optimal, Scalable and Corruption-tolerant Framework for Stochastic Bandits: From Single-Agent to Multi-Agent and Beyond

Abstract

We investigate various stochastic bandit problems in the presence of adversarial corruption. A seminal contribution to this area is the BARBAR~\citep{gupta2019better} algorithm, which is both simple and efficient, tolerating significant levels of corruption with nearly no degradation in performance. However, its regret upper bound exhibits a complexity of O(KC)O(KC), while the lower bound is Ω(C)\Omega(C). In this paper, we enhance the BARBAR algorithm by proposing a novel framework called BARBAT, which eliminates the factor of KK and achieves an optimal regret bound up to a logarithmic factor. We also demonstrate how BARBAT can be extended to various settings, including graph bandits, combinatorial semi-bandits, batched bandits and multi-agent bandits. In comparison to the Follow-The-Regularized-Leader (FTRL) family of methods, which provide a best-of-both-worlds guarantee, our approach is more efficient and parallelizable. Notably, FTRL-based methods face challenges in scaling to batched and multi-agent settings.

View on arXiv
@article{hu2025_2502.07514,
  title={ A Near-optimal, Scalable and Corruption-tolerant Framework for Stochastic Bandits: From Single-Agent to Multi-Agent and Beyond },
  author={ Zicheng Hu and Cheng Chen },
  journal={arXiv preprint arXiv:2502.07514},
  year={ 2025 }
}
Comments on this paper