ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.10402
12
18

A Near-Optimal Change-Detection Based Algorithm for Piecewise-Stationary Combinatorial Semi-Bandits

27 August 2019
Huozhi Zhou
Lingda Wang
L. Varshney
Ee-Peng Lim
ArXivPDFHTML
Abstract

We investigate the piecewise-stationary combinatorial semi-bandit problem. Compared to the original combinatorial semi-bandit problem, our setting assumes the reward distributions of base arms may change in a piecewise-stationary manner at unknown time steps. We propose an algorithm, \texttt{GLR-CUCB}, which incorporates an efficient combinatorial semi-bandit algorithm, \texttt{CUCB}, with an almost parameter-free change-point detector, the \emph{Generalized Likelihood Ratio Test} (GLRT). Our analysis shows that the regret of \texttt{GLR-CUCB} is upper bounded by O(NKTlog⁡T)\mathcal{O}(\sqrt{NKT\log{T}})O(NKTlogT​), where NNN is the number of piecewise-stationary segments, KKK is the number of base arms, and TTT is the number of time steps. As a complement, we also derive a nearly matching regret lower bound on the order of Ω(NKT\Omega(\sqrt{NKT}Ω(NKT​), for both piecewise-stationary multi-armed bandits and combinatorial semi-bandits, using information-theoretic techniques and judiciously constructed piecewise-stationary bandit instances. Our lower bound is tighter than the best available regret lower bound, which is Ω(T)\Omega(\sqrt{T})Ω(T​). Numerical experiments on both synthetic and real-world datasets demonstrate the superiority of \texttt{GLR-CUCB} compared to other state-of-the-art algorithms.

View on arXiv
Comments on this paper