ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1811.11925
37
10
v1v2 (latest)

Regret Bounds for Stochastic Combinatorial Multi-Armed Bandits with Linear Space Complexity

29 November 2018
Mridul Agarwal
Vaneet Aggarwal
ArXiv (abs)PDFHTML
Abstract

Many real-world problems face the dilemma of choosing best KKK out of NNN options at a given time instant. This setup can be modelled as combinatorial bandit which chooses KKK out of NNN arms at each time, with an aim to achieve an efficient tradeoff between exploration and exploitation. This is the first work for combinatorial bandit where the reward received can be a non-linear function of the chosen KKK arms. The direct use of multi-armed bandit requires choosing among NNN-choose-KKK options making the state space large. In this paper, we present a novel algorithm which is computationally efficient and the storage is linear in NNN. The proposed algorithm is a divide-and-conquer based strategy, that we call CMAB-SM. Further, the proposed algorithm achieves a regret bound of O~(K12N13T23)\tilde O(K^\frac{1}{2}N^\frac{1}{3}T^\frac{2}{3})O~(K21​N31​T32​) for a time horizon TTT, which is sub-linear in all parameters TTT, NNN, and KKK. The evaluation results on different reward functions and arm distribution functions show significantly improved performance as compared to standard multi-armed bandit approach with (NK)\binom{N}{K}(KN​) choices.

View on arXiv
Comments on this paper