Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1503.02834
Cited By
Doubly Robust Policy Evaluation and Optimization
10 March 2015
Miroslav Dudík
D. Erhan
John Langford
Lihong Li
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Doubly Robust Policy Evaluation and Optimization"
15 / 15 papers shown
Title
Counterfactual Inference under Thompson Sampling
Olivier Jeunen
OffRL
LRM
53
0
0
03 Apr 2025
Statistical Inference in Reinforcement Learning: A Selective Survey
Chengchun Shi
OffRL
181
2
0
22 Feb 2025
Bayesian Off-Policy Evaluation and Learning for Large Action Spaces
Imad Aouali
Victor-Emmanuel Brunel
David Rohde
Anna Korba
OffRL
97
5
0
22 Feb 2024
Confounding-Robust Policy Improvement with Human-AI Teams
Ruijiang Gao
Mingzhang Yin
220
3
0
13 Oct 2023
Selective Uncertainty Propagation in Offline RL
Sanath Kumar Krishnamurthy
Shrey Modi
Tanmay Gangwani
S. Katariya
Branislav Kveton
A. Rangi
OffRL
137
0
0
01 Feb 2023
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation
Yuta Saito
Shunsuke Aihara
Megumi Matsutani
Yusuke Narita
OffRL
115
75
0
17 Aug 2020
Taking a hint: How to leverage loss predictors in contextual bandits?
Chen-Yu Wei
Haipeng Luo
Alekh Agarwal
82
27
0
04 Mar 2020
Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits
Miroslav Dudík
D. Erhan
John Langford
Lihong Li
OffRL
94
38
0
16 Oct 2012
Counterfactual Reasoning and Learning Systems
Léon Bottou
J. Peters
J. Q. Candela
Denis Xavier Charles
D. M. Chickering
Elon Portugaly
Dipankar Ray
Patrice Y. Simard
Edward Snelson
CML
OffRL
199
781
0
11 Sep 2012
Doubly Robust Policy Evaluation and Learning
Miroslav Dudík
John Langford
Lihong Li
OffRL
173
694
0
23 Mar 2011
Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms
Lihong Li
Wei Chu
John Langford
Xuanhui Wang
OffRL
159
574
0
31 Mar 2010
A Contextual-Bandit Approach to Personalized News Article Recommendation
Lihong Li
Wei Chu
John Langford
Robert Schapire
296
2,935
0
28 Feb 2010
Learning from Logged Implicit Exploration Data
Alexander L. Strehl
John Langford
Sham Kakade
Lihong Li
OffRL
116
254
0
27 Feb 2010
Contextual Bandit Algorithms with Supervised Learning Guarantees
A. Beygelzimer
John Langford
Lihong Li
L. Reyzin
Robert Schapire
OffRL
149
324
0
22 Feb 2010
The Offset Tree for Learning with Partial Labels
A. Beygelzimer
John Langford
128
184
0
21 Dec 2008
1