ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1410.0949
34
47

Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

3 October 2014
Branislav Kveton
Zheng Wen
Azin Ashkan
Csaba Szepesvári
ArXivPDFHTML
Abstract

A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we close the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits. In particular, we analyze a UCB-like algorithm for solving the problem, which is known to be computationally efficient; and prove O(KL(1/Δ)log⁡n)O(K L (1 / \Delta) \log n)O(KL(1/Δ)logn) and O(KLnlog⁡n)O(\sqrt{K L n \log n})O(KLnlogn​) upper bounds on its nnn-step regret, where LLL is the number of ground items, KKK is the maximum number of chosen items, and Δ\DeltaΔ is the gap between the expected returns of the optimal and best suboptimal solutions. The gap-dependent bound is tight up to a constant factor and the gap-free bound is tight up to a polylogarithmic factor.

View on arXiv
Comments on this paper