A Near-optimal, Scalable and Corruption-tolerant Framework for Stochastic Bandits: From Single-Agent to Multi-Agent and Beyond

We investigate various stochastic bandit problems in the presence of adversarial corruption. A seminal contribution to this area is the BARBAR~\citep{gupta2019better} algorithm, which is both simple and efficient, tolerating significant levels of corruption with nearly no degradation in performance. However, its regret upper bound exhibits a complexity of , while the lower bound is . In this paper, we enhance the BARBAR algorithm by proposing a novel framework called BARBAT, which eliminates the factor of and achieves an optimal regret bound up to a logarithmic factor. We also demonstrate how BARBAT can be extended to various settings, including graph bandits, combinatorial semi-bandits, batched bandits and multi-agent bandits. In comparison to the Follow-The-Regularized-Leader (FTRL) family of methods, which provide a best-of-both-worlds guarantee, our approach is more efficient and parallelizable. Notably, FTRL-based methods face challenges in scaling to batched and multi-agent settings.
View on arXiv@article{hu2025_2502.07514, title={ A Near-optimal, Scalable and Corruption-tolerant Framework for Stochastic Bandits: From Single-Agent to Multi-Agent and Beyond }, author={ Zicheng Hu and Cheng Chen }, journal={arXiv preprint arXiv:2502.07514}, year={ 2025 } }