ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1806.11500
38
33
v1v2v3v4v5v6 (latest)

Bayesian Counterfactual Risk Minimization

29 June 2018
Ben London
Ted Sandler
    OffRL
ArXiv (abs)PDFHTML
Abstract

We present a Bayesian view of counterfactual risk minimization (CRM) for offline learning from logged bandit feedback. Using PAC-Bayesian analysis, we derive a new generalization bound for the truncated inverse propensity score estimator. We apply the bound to a class of Bayesian policies, which motivates a novel, potentially data-dependent, regularization technique for CRM. Experimental results indicate that this technique outperforms standard L2L_2L2​ regularization, and that it is competitive with variance regularization while being both simpler to implement and more computationally efficient.

View on arXiv
Comments on this paper