ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.10671
10
11

Pessimism for Offline Linear Contextual Bandits using ℓp\ell_pℓp​ Confidence Sets

21 May 2022
Gen Li
Cong Ma
Nathan Srebro
    OffRL
ArXivPDFHTML
Abstract

We present a family {π^}p≥1\{\hat{\pi}\}_{p\ge 1}{π^}p≥1​ of pessimistic learning rules for offline learning of linear contextual bandits, relying on confidence sets with respect to different ℓp\ell_pℓp​ norms, where π^2\hat{\pi}_2π^2​ corresponds to Bellman-consistent pessimism (BCP), while π^∞\hat{\pi}_\inftyπ^∞​ is a novel generalization of lower confidence bound (LCB) to the linear setting. We show that the novel π^∞\hat{\pi}_\inftyπ^∞​ learning rule is, in a sense, adaptively optimal, as it achieves the minimax performance (up to log factors) against all ℓq\ell_qℓq​-constrained problems, and as such it strictly dominates all other predictors in the family, including π^2\hat{\pi}_2π^2​.

View on arXiv
Comments on this paper