ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1602.02454
75
79

Efficient Algorithms for Adversarial Contextual Learning

8 February 2016
Vasilis Syrgkanis
A. Krishnamurthy
Robert Schapire
ArXivPDFHTML
Abstract

We provide the first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem. In this problem, the learner repeatedly makes an action on the basis of a context and receives reward for the chosen action, with the goal of achieving reward competitive with a large class of policies. We analyze two settings: i) in the transductive setting the learner knows the set of contexts a priori, ii) in the small separator setting, there exists a small set of contexts such that any two policies behave differently in one of the contexts in the set. Our algorithms fall into the follow the perturbed leader family \cite{Kalai2005} and achieve regret O(T3/4Klog⁡(N))O(T^{3/4}\sqrt{K\log(N)})O(T3/4Klog(N)​) in the transductive setting and O(T2/3d3/4Klog⁡(N))O(T^{2/3} d^{3/4} K\sqrt{\log(N)})O(T2/3d3/4Klog(N)​) in the separator setting, where KKK is the number of actions, NNN is the number of baseline policies, and ddd is the size of the separator. We actually solve the more general adversarial contextual semi-bandit linear optimization problem, whilst in the full information setting we address the even more general contextual combinatorial optimization. We provide several extensions and implications of our algorithms, such as switching regret and efficient learning with predictable sequences.

View on arXiv
Comments on this paper