ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.11002
  4. Cited By
Optimal Off-Policy Evaluation from Multiple Logging Policies

Optimal Off-Policy Evaluation from Multiple Logging Policies

21 October 2020
Nathan Kallus
Yuta Saito
Masatoshi Uehara
    OffRL
ArXivPDFHTML

Papers citing "Optimal Off-Policy Evaluation from Multiple Logging Policies"

30 / 30 papers shown
Title
DOLCE: Decomposing Off-Policy Evaluation/Learning into Lagged and Current Effects
DOLCE: Decomposing Off-Policy Evaluation/Learning into Lagged and Current Effects
Shu Tamano
Masanori Nojima
OffRL
138
0
0
02 May 2025
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible
  Off-Policy Evaluation
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation
Yuta Saito
Shunsuke Aihara
Megumi Matsutani
Yusuke Narita
OffRL
117
75
0
17 Aug 2020
Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation
  for Reinforcement Learning
Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation for Reinforcement Learning
Ming Yin
Yu Bai
Yu Wang
OffRL
75
31
0
07 Jul 2020
Efficient Evaluation of Natural Stochastic Policies in Offline
  Reinforcement Learning
Efficient Evaluation of Natural Stochastic Policies in Offline Reinforcement Learning
Nathan Kallus
Masatoshi Uehara
OffRL
29
9
0
06 Jun 2020
GenDICE: Generalized Offline Estimation of Stationary Values
GenDICE: Generalized Offline Estimation of Stationary Values
Ruiyi Zhang
Bo Dai
Lihong Li
Dale Schuurmans
OffRL
132
173
0
21 Feb 2020
Adaptive Estimator Selection for Off-Policy Evaluation
Adaptive Estimator Selection for Off-Policy Evaluation
Yi-Hsun Su
Pavithra Srinath
A. Krishnamurthy
OffRL
23
45
0
18 Feb 2020
Statistically Efficient Off-Policy Policy Gradients
Statistically Efficient Off-Policy Policy Gradients
Nathan Kallus
Masatoshi Uehara
OffRL
47
37
0
10 Feb 2020
Inference for Batched Bandits
Inference for Batched Bandits
Kelly W. Zhang
Lucas Janson
Susan Murphy
64
82
0
08 Feb 2020
Confidence Intervals for Policy Evaluation in Adaptive Experiments
Confidence Intervals for Policy Evaluation in Adaptive Experiments
Vitor Hadad
David A. Hirshberg
Ruohan Zhan
Stefan Wager
Susan Athey
36
143
0
07 Nov 2019
Minimax Weight and Q-Function Learning for Off-Policy Evaluation
Minimax Weight and Q-Function Learning for Off-Policy Evaluation
Masatoshi Uehara
Jiawei Huang
Nan Jiang
OffRL
99
186
0
28 Oct 2019
Infinite-horizon Off-Policy Policy Evaluation with Multiple Behavior
  Policies
Infinite-horizon Off-Policy Policy Evaluation with Multiple Behavior Policies
Xinyun Chen
Lu Wang
Yizhe Hang
Heng Ge
H. Zha
OffRL
69
5
0
10 Oct 2019
Double Reinforcement Learning for Efficient Off-Policy Evaluation in
  Markov Decision Processes
Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes
Nathan Kallus
Masatoshi Uehara
OffRL
68
185
0
22 Aug 2019
Off-policy Learning for Multiple Loggers
Off-policy Learning for Multiple Loggers
Li He
Long Xia
Wei Zeng
Zhi-Ming Ma
Yue Zhao
Dawei Yin
OffRL
32
10
0
23 Jul 2019
Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for
  Reinforcement Learning
Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning
Nathan Kallus
Masatoshi Uehara
OffRL
55
54
0
09 Jun 2019
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
Qiang Liu
Lihong Li
Ziyang Tang
Dengyong Zhou
OffRL
101
354
0
29 Oct 2018
Efficient Counterfactual Learning from Bandit Feedback
Efficient Counterfactual Learning from Bandit Feedback
Yusuke Narita
Shota Yasui
Kohei Yata
OffRL
52
47
0
10 Sep 2018
Analysis of Noise Contrastive Estimation from the Perspective of
  Asymptotic Variance
Analysis of Noise Contrastive Estimation from the Perspective of Asymptotic Variance
Masatoshi Uehara
Takeru Matsuda
F. Komaki
28
13
0
24 Aug 2018
Counterfactual Mean Embeddings
Counterfactual Mean Embeddings
Krikamol Muandet
Motonobu Kanagawa
Sorawit Saengkyongam
S. Marukatat
CML
OffRL
44
39
0
22 May 2018
More Robust Doubly Robust Off-policy Evaluation
More Robust Doubly Robust Off-policy Evaluation
Mehrdad Farajtabar
Yinlam Chow
Mohammad Ghavamzadeh
OffRL
54
267
0
10 Feb 2018
Balanced Policy Evaluation and Learning
Balanced Policy Evaluation and Learning
Nathan Kallus
CML
OffRL
243
141
0
21 May 2017
Effective Evaluation using Logged Bandit Feedback from Multiple Loggers
Effective Evaluation using Logged Bandit Feedback from Multiple Loggers
Aman Agarwal
Soumya Basu
Tobias Schnabel
Thorsten Joachims
OffRL
95
68
0
17 Mar 2017
Optimal and Adaptive Off-policy Evaluation in Contextual Bandits
Optimal and Adaptive Off-policy Evaluation in Contextual Bandits
Yu Wang
Alekh Agarwal
Miroslav Dudík
OffRL
63
220
0
04 Dec 2016
Recursive Partitioning for Personalization using Observational Data
Recursive Partitioning for Personalization using Observational Data
Nathan Kallus
CML
67
99
0
31 Aug 2016
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learning
Rémi Munos
T. Stepleton
Anna Harutyunyan
Marc G. Bellemare
OffRL
125
611
0
08 Jun 2016
Off-policy evaluation for slate recommendation
Off-policy evaluation for slate recommendation
Adith Swaminathan
A. Krishnamurthy
Alekh Agarwal
Miroslav Dudík
John Langford
Damien Jose
I. Zitouni
CML
OffRL
43
227
0
16 May 2016
Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
Philip S. Thomas
Emma Brunskill
OffRL
246
573
0
04 Apr 2016
Statistical inference for the mean outcome under a possibly non-unique
  optimal treatment strategy
Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy
Alexander Luedtke
M. J. van der Laan
84
220
0
24 Mar 2016
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
Nan Jiang
Lihong Li
OffRL
148
621
0
11 Nov 2015
Doubly Robust Policy Evaluation and Optimization
Doubly Robust Policy Evaluation and Optimization
Miroslav Dudík
D. Erhan
John Langford
Lihong Li
OffRL
133
285
0
10 Mar 2015
Learning from Logged Implicit Exploration Data
Learning from Logged Implicit Exploration Data
Alexander L. Strehl
John Langford
Sham Kakade
Lihong Li
OffRL
116
254
0
27 Feb 2010
1