ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.11975
53
34

Blocking Bandits

27 July 2019
Soumya Basu
Rajat Sen
Sujay Sanghavi
Sanjay Shakkottai
ArXiv (abs)PDFHTML
Abstract

We consider a novel stochastic multi-armed bandit setting, where playing an arm makes it unavailable for a fixed number of time slots thereafter. This models situations where reusing an arm too often is undesirable (e.g. making the same product recommendation repeatedly) or infeasible (e.g. compute job scheduling on machines). We show that with prior knowledge of the rewards and delays of all the arms, the problem of optimizing cumulative reward does not admit any pseudo-polynomial time algorithm (in the number of arms) unless randomized exponential time hypothesis is false, by mapping to the PINWHEEL scheduling problem. Subsequently, we show that a simple greedy algorithm that plays the available arm with the highest reward is asymptotically (1−1/e)(1-1/e)(1−1/e) optimal. When the rewards are unknown, we design a UCB based algorithm which is shown to have clog⁡T+o(log⁡T)c \log T + o(\log T)clogT+o(logT) cumulative regret against the greedy algorithm, leveraging the free exploration of arms due to the unavailability. Finally, when all the delays are equal the problem reduces to Combinatorial Semi-bandits providing us with a lower bound of c′log⁡T+ω(log⁡T)c' \log T+ \omega(\log T)c′logT+ω(logT).

View on arXiv
Comments on this paper