ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1802.03386
82
30

Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits

9 February 2018
Zeyuan Allen-Zhu
Sébastien Bubeck
Yuanzhi Li
    LRM
ArXivPDFHTML
Abstract

Regret bounds in online learning compare the player's performance to L∗L^*L∗, the optimal performance in hindsight with a fixed strategy. Typically such bounds scale with the square root of the time horizon TTT. The more refined concept of first-order regret bound replaces this with a scaling L∗\sqrt{L^*}L∗​, which may be much smaller than T\sqrt{T}T​. It is well known that minor variants of standard algorithms satisfy first-order regret bounds in the full information and multi-armed bandit settings. In a COLT 2017 open problem, Agarwal, Krishnamurthy, Langford, Luo, and Schapire raised the issue that existing techniques do not seem sufficient to obtain first-order regret bounds for the contextual bandit problem. In the present paper, we resolve this open problem by presenting a new strategy based on augmenting the policy space.

View on arXiv
Comments on this paper