ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.08526
  4. Cited By
Double Reinforcement Learning for Efficient Off-Policy Evaluation in
  Markov Decision Processes
v1v2v3 (latest)

Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes

Journal of machine learning research (JMLR), 2019
22 August 2019
Nathan Kallus
Masatoshi Uehara
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes"

50 / 127 papers shown
Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation
Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation
Feichen Gan
Youcun Lu
Yingying Zhang
Yukun Liu
OffRL
145
0
0
29 Oct 2025
Learning density ratios in causal inference using Bregman-Riesz regression
Learning density ratios in causal inference using Bregman-Riesz regression
Oliver J. Hines
Caleb H. Miles
CML
185
3
0
17 Oct 2025
Latent Variable Modeling for Robust Causal Effect Estimation
Latent Variable Modeling for Robust Causal Effect Estimation
Tetsuro Morimura
Tatsushi Oka
Yugo Suzuki
Daisuke Moriwaki
CML
198
0
0
27 Aug 2025
A Two-armed Bandit Framework for A/B Testing
A Two-armed Bandit Framework for A/B Testing
Jinjuan Wang
Qianglin Wen
Yu Zhang
Xiaodong Yan
Chengchun Shi
220
1
0
24 Jul 2025
Doubly Robust Alignment for Large Language Models
Doubly Robust Alignment for Large Language Models
Erhan Xu
Kai Ye
Hongyi Zhou
Luhan Zhu
Francesco Quinzan
Chengchun Shi
353
7
0
01 Jun 2025
Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation
Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation
Hongyi Zhou
Josiah P. Hanna
Jin Zhu
Ying Yang
Chengchun Shi
OffRL
249
4
0
28 May 2025
Treatment Effect Estimation for Optimal Decision-Making
Treatment Effect Estimation for Optimal Decision-Making
Dennis Frauen
Valentyn Melnychuk
Jonas Schweisthal
Mihaela van der Schaar
Stefan Feuerriegel
CML
394
6
0
19 May 2025
DOLCE: Decomposing Off-Policy Evaluation/Learning into Lagged and Current Effects
DOLCE: Decomposing Off-Policy Evaluation/Learning into Lagged and Current Effects
Shu Tamano
OffRL
490
0
0
02 May 2025
Statistical Tractability of Off-policy Evaluation of History-dependent Policies in POMDPs
Statistical Tractability of Off-policy Evaluation of History-dependent Policies in POMDPsInternational Conference on Learning Representations (ICLR), 2025
Yuheng Zhang
Nan Jiang
OffRL
302
5
0
03 Mar 2025
Statistical Inference in Reinforcement Learning: A Selective Survey
Statistical Inference in Reinforcement Learning: A Selective Survey
Chengchun Shi
OffRL
689
4
0
22 Feb 2025
Learning Counterfactual Outcomes Under Rank Preservation
Learning Counterfactual Outcomes Under Rank Preservation
Peng Wu
Haoxuan Li
Chunyuan Zheng
Yan Zeng
Jiawei Chen
Yang Liu
Ruocheng Guo
Jianchao Tan
352
4
0
10 Feb 2025
Semiparametric Double Reinforcement Learning with Applications to Long-Term Causal Inference
Semiparametric Double Reinforcement Learning with Applications to Long-Term Causal Inference
Lars van der Laan
David Hubbard
Allen Tran
Nathan Kallus
Aurélien F. Bibaut
OffRL
496
0
0
12 Jan 2025
A Graphical Approach to State Variable Selection in Off-policy Learning
A Graphical Approach to State Variable Selection in Off-policy Learning
Joakim Blach Andersen
Qingyuan Zhao
CMLOffRL
290
1
0
03 Jan 2025
Logarithmic Neyman Regret for Adaptive Estimation of the Average
  Treatment Effect
Logarithmic Neyman Regret for Adaptive Estimation of the Average Treatment EffectInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Ojash Neopane
Aaditya Ramdas
Aarti Singh
CML
293
3
0
21 Nov 2024
Debiased Regression for Root-N-Consistent Conditional Mean Estimation
Masahiro Kato
465
0
0
18 Nov 2024
Hybrid Transfer Reinforcement Learning: Provable Sample Efficiency from
  Shifted-Dynamics Data
Hybrid Transfer Reinforcement Learning: Provable Sample Efficiency from Shifted-Dynamics DataInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Chengrui Qu
Laixi Shi
Kishan Panaganti
Pengcheng You
Adam Wierman
OffRLOnRL
307
6
0
06 Nov 2024
Primal-Dual Spectral Representation for Off-policy Evaluation
Primal-Dual Spectral Representation for Off-policy EvaluationInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Yang Hu
Tianyi Chen
Na Li
Kai Wang
Bo Dai
OffRL
320
4
0
23 Oct 2024
CSPI-MT: Calibrated Safe Policy Improvement with Multiple Testing for
  Threshold Policies
CSPI-MT: Calibrated Safe Policy Improvement with Multiple Testing for Threshold PoliciesKnowledge Discovery and Data Mining (KDD), 2024
Brian M Cho
Ana-Roxana Pop
Kyra Gan
Sam Corbett-Davies
Israel Nir
Ariel Evnine
Nathan Kallus
OffRL
236
2
0
21 Aug 2024
Model-agnostic meta-learners for estimating heterogeneous treatment effects over time
Model-agnostic meta-learners for estimating heterogeneous treatment effects over time
Dennis Frauen
Konstantin Hess
Stefan Feuerriegel
504
16
0
07 Jul 2024
Structured Difference-of-Q via Orthogonal Learning
Structured Difference-of-Q via Orthogonal Learning
Defu Cao
Angela Zhou
468
0
0
12 Jun 2024
Combining Experimental and Historical Data for Policy Evaluation
Combining Experimental and Historical Data for Policy Evaluation
Ting Li
Chengchun Shi
Qianglin Wen
Yang Sui
Yongli Qin
Chunbo Lai
Hongtu Zhu
OffRL
480
4
0
01 Jun 2024
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical
  Behaviors in Deep Off-Policy RL
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL
Yu-Juan Luo
Tianying Ji
Gang Hua
Jianwei Zhang
Huazhe Xu
Xianyuan Zhan
OffRLOnRL
294
7
0
28 May 2024
Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision
  Processes
Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes
Andrew Bennett
Nathan Kallus
Miruna Oprescu
Wen Sun
Kaiwen Wang
AAMLOffRL
307
4
0
29 Mar 2024
Spatially Randomized Designs Can Enhance Policy Evaluation
Spatially Randomized Designs Can Enhance Policy Evaluation
Ying Yang
Chengchun Shi
Fang Yao
Shouyang Wang
Hongtu Zhu
OffRL
322
2
0
18 Mar 2024
Triple/Debiased Lasso for Statistical Inference of Conditional Average
  Treatment Effects
Triple/Debiased Lasso for Statistical Inference of Conditional Average Treatment Effects
Masahiro Kato
CML
339
1
0
05 Mar 2024
On the Curses of Future and History in Future-dependent Value Functions
  for Off-policy Evaluation
On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation
Yuheng Zhang
Nan Jiang
OffRL
318
7
0
22 Feb 2024
Off-Policy Evaluation in Markov Decision Processes under Weak Distributional Overlap
Off-Policy Evaluation in Markov Decision Processes under Weak Distributional Overlap
Mohammad Mehrabi
Stefan Wager
OffRL
358
18
0
13 Feb 2024
POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy
  Decomposition
POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy Decomposition
Yuta Saito
Jihan Yao
Thorsten Joachims
OffRL
335
14
0
09 Feb 2024
Evaluation of Active Feature Acquisition Methods for Static Feature
  Settings
Evaluation of Active Feature Acquisition Methods for Static Feature Settings
Henrik von Kleist
Alireza Zamanian
I. Shpitser
Narges Ahmidi
OffRL
316
4
0
06 Dec 2023
Marginal Density Ratio for Off-Policy Evaluation in Contextual Bandits
Marginal Density Ratio for Off-Policy Evaluation in Contextual BanditsNeural Information Processing Systems (NeurIPS), 2023
Muhammad Faaiz Taufiq
Arnaud Doucet
Rob Cornish
Jean-François Ton
OffRL
351
10
0
03 Dec 2023
Evaluation of Active Feature Acquisition Methods for Time-varying Feature Settings
Evaluation of Active Feature Acquisition Methods for Time-varying Feature Settings
Henrik von Kleist
Alireza Zamanian
I. Shpitser
Narges Ahmidi
OffRL
719
8
0
03 Dec 2023
SCOPE-RL: A Python Library for Offline Reinforcement Learning and
  Off-Policy Evaluation
SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation
Haruka Kiyohara
Ren Kishimoto
K. Kawakami
Ken Kobayashi
Kazuhide Nakata
Yuta Saito
OffRLELM
541
5
0
30 Nov 2023
Randomization Inference When N Equals One
Randomization Inference When N Equals OneBiometrika (Biometrika), 2023
Tengyuan Liang
Benjamin Recht
CML
254
10
0
25 Oct 2023
Distributional Shift-Aware Off-Policy Interval Estimation: A Unified
  Error Quantification Framework
Distributional Shift-Aware Off-Policy Interval Estimation: A Unified Error Quantification Framework
Wenzhuo Zhou
Yuhan Li
Ruoqing Zhu
Annie Qu
OffRL
327
7
0
23 Sep 2023
Off-policy Evaluation in Doubly Inhomogeneous Environments
Off-policy Evaluation in Doubly Inhomogeneous EnvironmentsJournal of the American Statistical Association (JASA), 2023
Zeyu Bian
C. Shi
Zhengling Qi
Lan Wang
OffRL
328
12
0
14 Jun 2023
High-probability sample complexities for policy evaluation with linear
  function approximation
High-probability sample complexities for policy evaluation with linear function approximationIEEE Transactions on Information Theory (IEEE Trans. Inf. Theory), 2023
Gen Li
Weichen Wu
Yuejie Chi
Cong Ma
Alessandro Rinaldo
Yuting Wei
OffRL
454
10
0
30 May 2023
Off-Policy Evaluation for Large Action Spaces via Conjunct Effect
  Modeling
Off-Policy Evaluation for Large Action Spaces via Conjunct Effect ModelingInternational Conference on Machine Learning (ICML), 2023
Yuta Saito
Qingyang Ren
Thorsten Joachims
CMLOffRL
340
33
0
14 May 2023
Correcting for Interference in Experiments: A Case Study at Douyin
Correcting for Interference in Experiments: A Case Study at DouyinACM Conference on Recommender Systems (RecSys), 2023
Vivek F. Farias
Hao Li
Tianyi Peng
Xinyuyang Ren
B. Hassibi
A. Zheng
243
18
0
04 May 2023
Conformal Off-Policy Evaluation in Markov Decision Processes
Conformal Off-Policy Evaluation in Markov Decision ProcessesIEEE Conference on Decision and Control (CDC), 2023
Daniele Foffano
Alessio Russo
Alexandre Proutiere
OffRL
420
9
0
05 Apr 2023
Hallucinated Adversarial Control for Conservative Offline Policy
  Evaluation
Hallucinated Adversarial Control for Conservative Offline Policy EvaluationConference on Uncertainty in Artificial Intelligence (UAI), 2023
Jonas Rothfuss
Bhavya Sukhija
Tobias Birchler
Parnian Kassraie
Andreas Krause
OffRL
274
14
0
02 Mar 2023
Asking for Help: Failure Prediction in Behavioral Cloning through Value
  Approximation
Asking for Help: Failure Prediction in Behavioral Cloning through Value ApproximationIEEE International Conference on Robotics and Automation (ICRA), 2023
Cem Gokmen
Daniel Ho
Mohi Khansari
OffRL
213
13
0
08 Feb 2023
Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders
Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders
David Bruns-Smith
Angela Zhou
OffRL
694
14
0
01 Feb 2023
Asymptotic Inference for Multi-Stage Stationary Treatment Policy with Variable Selection
Asymptotic Inference for Multi-Stage Stationary Treatment Policy with Variable Selection
Daiqi Gao
Yufeng Liu
D. Zeng
OffRL
289
0
0
29 Jan 2023
Model-based Offline Reinforcement Learning with Local Misspecification
Model-based Offline Reinforcement Learning with Local MisspecificationAAAI Conference on Artificial Intelligence (AAAI), 2023
Kefan Dong
Yannis Flet-Berliac
Allen Nie
Emma Brunskill
OffRL
258
6
0
26 Jan 2023
Kernel-based off-policy estimation without overlap: Instance optimality
  beyond semiparametric efficiency
Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency
Wenlong Mou
Peng Ding
Martin J. Wainwright
Peter L. Bartlett
OffRL
272
12
0
16 Jan 2023
Quantile Off-Policy Evaluation via Deep Conditional Generative Learning
Quantile Off-Policy Evaluation via Deep Conditional Generative Learning
Yang Xu
C. Shi
Shuang Luo
Lan Wang
R. Song
OffRL
296
6
0
29 Dec 2022
A Review of Off-Policy Evaluation in Reinforcement Learning
A Review of Off-Policy Evaluation in Reinforcement Learning
Masatoshi Uehara
C. Shi
Nathan Kallus
OffRL
299
110
0
13 Dec 2022
Beyond the Return: Off-policy Function Estimation under User-specified
  Error-measuring Distributions
Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring DistributionsNeural Information Processing Systems (NeurIPS), 2022
Audrey Huang
Nan Jiang
OffRL
217
9
0
27 Oct 2022
A Unified Framework for Alternating Offline Model Training and Policy
  Learning
A Unified Framework for Alternating Offline Model Training and Policy LearningNeural Information Processing Systems (NeurIPS), 2022
Shentao Yang
Shujian Zhang
Yihao Feng
Mi Zhou
OffRL
321
17
0
12 Oct 2022
Offline Reinforcement Learning with Differentiable Function
  Approximation is Provably Efficient
Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient
Ming Yin
Mengdi Wang
Yu Wang
OffRL
407
12
0
03 Oct 2022
123
Next
Page 1 of 3