ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1503.02834
  4. Cited By
Doubly Robust Policy Evaluation and Optimization

Doubly Robust Policy Evaluation and Optimization

10 March 2015
Miroslav Dudík
D. Erhan
John Langford
Lihong Li
    OffRL
ArXivPDFHTML

Papers citing "Doubly Robust Policy Evaluation and Optimization"

15 / 15 papers shown
Title
Counterfactual Inference under Thompson Sampling
Counterfactual Inference under Thompson Sampling
Olivier Jeunen
OffRL
LRM
53
0
0
03 Apr 2025
Statistical Inference in Reinforcement Learning: A Selective Survey
Statistical Inference in Reinforcement Learning: A Selective Survey
Chengchun Shi
OffRL
188
2
0
22 Feb 2025
Bayesian Off-Policy Evaluation and Learning for Large Action Spaces
Bayesian Off-Policy Evaluation and Learning for Large Action Spaces
Imad Aouali
Victor-Emmanuel Brunel
David Rohde
Anna Korba
OffRL
104
5
0
22 Feb 2024
Confounding-Robust Policy Improvement with Human-AI Teams
Confounding-Robust Policy Improvement with Human-AI Teams
Ruijiang Gao
Mingzhang Yin
231
3
0
13 Oct 2023
Selective Uncertainty Propagation in Offline RL
Selective Uncertainty Propagation in Offline RL
Sanath Kumar Krishnamurthy
Shrey Modi
Tanmay Gangwani
S. Katariya
Branislav Kveton
A. Rangi
OffRL
139
0
0
01 Feb 2023
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible
  Off-Policy Evaluation
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation
Yuta Saito
Shunsuke Aihara
Megumi Matsutani
Yusuke Narita
OffRL
122
75
0
17 Aug 2020
Taking a hint: How to leverage loss predictors in contextual bandits?
Taking a hint: How to leverage loss predictors in contextual bandits?
Chen-Yu Wei
Haipeng Luo
Alekh Agarwal
84
27
0
04 Mar 2020
Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits
Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits
Miroslav Dudík
D. Erhan
John Langford
Lihong Li
OffRL
98
38
0
16 Oct 2012
Counterfactual Reasoning and Learning Systems
Counterfactual Reasoning and Learning Systems
Léon Bottou
J. Peters
J. Q. Candela
Denis Xavier Charles
D. M. Chickering
Elon Portugaly
Dipankar Ray
Patrice Y. Simard
Edward Snelson
CML
OffRL
212
781
0
11 Sep 2012
Doubly Robust Policy Evaluation and Learning
Doubly Robust Policy Evaluation and Learning
Miroslav Dudík
John Langford
Lihong Li
OffRL
181
694
0
23 Mar 2011
Unbiased Offline Evaluation of Contextual-bandit-based News Article
  Recommendation Algorithms
Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms
Lihong Li
Wei Chu
John Langford
Xuanhui Wang
OffRL
164
574
0
31 Mar 2010
A Contextual-Bandit Approach to Personalized News Article Recommendation
A Contextual-Bandit Approach to Personalized News Article Recommendation
Lihong Li
Wei Chu
John Langford
Robert Schapire
307
2,935
0
28 Feb 2010
Learning from Logged Implicit Exploration Data
Learning from Logged Implicit Exploration Data
Alexander L. Strehl
John Langford
Sham Kakade
Lihong Li
OffRL
121
254
0
27 Feb 2010
Contextual Bandit Algorithms with Supervised Learning Guarantees
Contextual Bandit Algorithms with Supervised Learning Guarantees
A. Beygelzimer
John Langford
Lihong Li
L. Reyzin
Robert Schapire
OffRL
154
324
0
22 Feb 2010
The Offset Tree for Learning with Partial Labels
The Offset Tree for Learning with Partial Labels
A. Beygelzimer
John Langford
139
184
0
21 Dec 2008
1