ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.07258
37
21
v1v2 (latest)

Statistically Efficient, Polynomial Time Algorithms for Combinatorial Semi Bandits

17 February 2020
Thibaut Cuvelier
Richard Combes
É. Gourdin
ArXiv (abs)PDFHTML
Abstract

We consider combinatorial semi-bandits over a set of arms X⊂{0,1}d{\cal X} \subset \{0,1\}^dX⊂{0,1}d where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound R(T)=O(d(ln⁡m)2(ln⁡T)Δmin⁡)R(T) = {\cal O}\Big( {d (\ln m)^2 (\ln T) \over \Delta_{\min} }\Big)R(T)=O(Δmin​d(lnm)2(lnT)​), but it has computational complexity O(∣X∣){\cal O}(|{\cal X}|)O(∣X∣) which is typically exponential in ddd, and cannot be used in large dimensions. We propose the first algorithm which is both computationally and statistically efficient for this problem with regret R(T)=O(d(ln⁡m)2(ln⁡T)Δmin⁡)R(T) = {\cal O} \Big({d (\ln m)^2 (\ln T)\over \Delta_{\min} }\Big)R(T)=O(Δmin​d(lnm)2(lnT)​) and computational complexity O(Tpoly(d)){\cal O}(T {\bf poly}(d))O(Tpoly(d)). Our approach involves carefully designing an approximate version of ESCB with the same regret guarantees, showing that this approximate algorithm can be implemented in time O(Tpoly(d)){\cal O}(T {\bf poly}(d))O(Tpoly(d)) by repeatedly maximizing a linear function over X{\cal X}X subject to a linear budget constraint, and showing how to solve this maximization problems efficiently.

View on arXiv
Comments on this paper