Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1503.02834
Cited By
Doubly Robust Policy Evaluation and Optimization
10 March 2015
Miroslav Dudík
D. Erhan
John Langford
Lihong Li
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Doubly Robust Policy Evaluation and Optimization"
15 / 15 papers shown
Title
Counterfactual Inference under Thompson Sampling
Olivier Jeunen
OffRL
LRM
53
0
0
03 Apr 2025
Statistical Inference in Reinforcement Learning: A Selective Survey
Chengchun Shi
OffRL
188
2
0
22 Feb 2025
Bayesian Off-Policy Evaluation and Learning for Large Action Spaces
Imad Aouali
Victor-Emmanuel Brunel
David Rohde
Anna Korba
OffRL
104
5
0
22 Feb 2024
Confounding-Robust Policy Improvement with Human-AI Teams
Ruijiang Gao
Mingzhang Yin
231
3
0
13 Oct 2023
Selective Uncertainty Propagation in Offline RL
Sanath Kumar Krishnamurthy
Shrey Modi
Tanmay Gangwani
S. Katariya
Branislav Kveton
A. Rangi
OffRL
139
0
0
01 Feb 2023
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation
Yuta Saito
Shunsuke Aihara
Megumi Matsutani
Yusuke Narita
OffRL
122
75
0
17 Aug 2020
Taking a hint: How to leverage loss predictors in contextual bandits?
Chen-Yu Wei
Haipeng Luo
Alekh Agarwal
84
27
0
04 Mar 2020
Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits
Miroslav Dudík
D. Erhan
John Langford
Lihong Li
OffRL
98
38
0
16 Oct 2012
Counterfactual Reasoning and Learning Systems
Léon Bottou
J. Peters
J. Q. Candela
Denis Xavier Charles
D. M. Chickering
Elon Portugaly
Dipankar Ray
Patrice Y. Simard
Edward Snelson
CML
OffRL
212
781
0
11 Sep 2012
Doubly Robust Policy Evaluation and Learning
Miroslav Dudík
John Langford
Lihong Li
OffRL
181
694
0
23 Mar 2011
Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms
Lihong Li
Wei Chu
John Langford
Xuanhui Wang
OffRL
164
574
0
31 Mar 2010
A Contextual-Bandit Approach to Personalized News Article Recommendation
Lihong Li
Wei Chu
John Langford
Robert Schapire
307
2,935
0
28 Feb 2010
Learning from Logged Implicit Exploration Data
Alexander L. Strehl
John Langford
Sham Kakade
Lihong Li
OffRL
121
254
0
27 Feb 2010
Contextual Bandit Algorithms with Supervised Learning Guarantees
A. Beygelzimer
John Langford
Lihong Li
L. Reyzin
Robert Schapire
OffRL
154
324
0
22 Feb 2010
The Offset Tree for Learning with Partial Labels
A. Beygelzimer
John Langford
139
184
0
21 Dec 2008
1