ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.07903
19
2

Tight Memory-Regret Lower Bounds for Streaming Bandits

13 June 2023
Shaoang Li
Lan Zhang
Junhao Wang
Xiang-Yang Li
ArXivPDFHTML
Abstract

In this paper, we investigate the streaming bandits problem, wherein the learner aims to minimize regret by dealing with online arriving arms and sublinear arm memory. We establish the tight worst-case regret lower bound of Ω((TB)αK1−α),α=2B/(2B+1−1)\Omega \left( (TB)^{\alpha} K^{1-\alpha}\right), \alpha = 2^{B} / (2^{B+1}-1)Ω((TB)αK1−α),α=2B/(2B+1−1) for any algorithm with a time horizon TTT, number of arms KKK, and number of passes BBB. The result reveals a separation between the stochastic bandits problem in the classical centralized setting and the streaming setting with bounded arm memory. Notably, in comparison to the well-known Ω(KT)\Omega(\sqrt{KT})Ω(KT​) lower bound, an additional double logarithmic factor is unavoidable for any streaming bandits algorithm with sublinear memory permitted. Furthermore, we establish the first instance-dependent lower bound of Ω(T1/(B+1)∑Δx>0μ∗Δx)\Omega \left(T^{1/(B+1)} \sum_{\Delta_x>0} \frac{\mu^*}{\Delta_x}\right)Ω(T1/(B+1)∑Δx​>0​Δx​μ∗​) for streaming bandits. These lower bounds are derived through a unique reduction from the regret-minimization setting to the sample complexity analysis for a sequence of ϵ\epsilonϵ-optimal arms identification tasks, which maybe of independent interest. To complement the lower bound, we also provide a multi-pass algorithm that achieves a regret upper bound of O~((TB)αK1−α)\tilde{O} \left( (TB)^{\alpha} K^{1 - \alpha}\right)O~((TB)αK1−α) using constant arm memory.

View on arXiv
Comments on this paper