9
0

Note on Follow-the-Perturbed-Leader in Combinatorial Semi-Bandit Problems

Main:38 Pages
1 Figures
Bibliography:2 Pages
2 Tables
Abstract

This paper studies the optimality and complexity of Follow-the-Perturbed-Leader (FTPL) policy in size-invariant combinatorial semi-bandit problems. Recently, Honda et al. (2023) and Lee et al. (2024) showed that FTPL achieves Best-of-Both-Worlds (BOBW) optimality in standard multi-armed bandit problems with Fréchet-type distributions. However, the optimality of FTPL in combinatorial semi-bandit problems remains unclear. In this paper, we consider the regret bound of FTPL with geometric resampling (GR) in size-invariant semi-bandit setting, showing that FTPL respectively achieves O(m2d1αT+mdT)O\left(\sqrt{m^2 d^\frac{1}{\alpha}T}+\sqrt{mdT}\right) regret with Fréchet distributions, and the best possible regret bound of O(mdT)O\left(\sqrt{mdT}\right) with Pareto distributions in adversarial setting. Furthermore, we extend the conditional geometric resampling (CGR) to size-invariant semi-bandit setting, which reduces the computational complexity from O(d2)O(d^2) of original GR to O(md(log(d/m)+1))O\left(md\left(\log(d/m)+1\right)\right) without sacrificing the regret performance of FTPL.

View on arXiv
@article{chen2025_2506.12490,
  title={ Note on Follow-the-Perturbed-Leader in Combinatorial Semi-Bandit Problems },
  author={ Botao Chen and Junya Honda },
  journal={arXiv preprint arXiv:2506.12490},
  year={ 2025 }
}
Comments on this paper