ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.05886
12
6

Nearly Optimal Algorithms for Piecewise-Stationary Cascading Bandits

12 September 2019
Lingda Wang
Huozhi Zhou
Bingcong Li
L. Varshney
Zhizhen Zhao
ArXivPDFHTML
Abstract

Cascading bandit (CB) is a popular model for web search and online advertising, where an agent aims to learn the KKK most attractive items out of a ground set of size LLL during the interaction with a user. However, the stationary CB model may be too simple to apply to real-world problems, where user preferences may change over time. Considering piecewise-stationary environments, two efficient algorithms, \texttt{GLRT-CascadeUCB} and \texttt{GLRT-CascadeKL-UCB}, are developed and shown to ensure regret upper bounds on the order of O(NLTlog⁡T)\mathcal{O}(\sqrt{NLT\log{T}})O(NLTlogT​), where NNN is the number of piecewise-stationary segments, and TTT is the number of time slots. At the crux of the proposed algorithms is an almost parameter-free change-point detector, the generalized likelihood ratio test (GLRT). Comparing with existing works, the GLRT-based algorithms: i) are free of change-point-dependent information for choosing parameters; ii) have fewer tuning parameters; iii) improve at least the LLL dependence in regret upper bounds. In addition, we show that the proposed algorithms are optimal (up to a logarithm factor) in terms of regret by deriving a minimax lower bound on the order of Ω(NLT)\Omega(\sqrt{NLT})Ω(NLT​) for piecewise-stationary CB. The efficiency of the proposed algorithms relative to state-of-the-art approaches is validated through numerical experiments on both synthetic and real-world datasets.

View on arXiv
Comments on this paper