ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.04014
  4. Cited By
Statistically Efficient Off-Policy Policy Gradients
v1v2 (latest)

Statistically Efficient Off-Policy Policy Gradients

International Conference on Machine Learning (ICML), 2020
10 February 2020
Nathan Kallus
Masatoshi Uehara
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Statistically Efficient Off-Policy Policy Gradients"

27 / 27 papers shown
ExGRPO: Learning to Reason from Experience
ExGRPO: Learning to Reason from Experience
Runzhe Zhan
Yafu Li
Zhi Wang
Xiaoye Qu
Dongrui Liu
Jing Shao
Derek F. Wong
Yu Cheng
OffRLLRM
193
11
1
02 Oct 2025
Doubly Robust Alignment for Large Language Models
Doubly Robust Alignment for Large Language Models
Erhan Xu
Kai Ye
Hongyi Zhou
Luhan Zhu
Francesco Quinzan
Chengchun Shi
353
7
0
01 Jun 2025
Reinforcement Learning with Continuous Actions Under Unmeasured Confounding
Reinforcement Learning with Continuous Actions Under Unmeasured Confounding
Yuhan Li
Eugene Han
Yifan Hu
Wenzhuo Zhou
Zhengling Qi
Yifan Cui
Ruoqing Zhu
OffRL
983
1
0
01 May 2025
Enhancing PPO with Trajectory-Aware Hybrid Policies
Enhancing PPO with Trajectory-Aware Hybrid Policies
Qisai Liu
Zhanhong Jiang
Hsin-Jung Yang
Mahsa Khosravi
Joshua R. Waite
Soumik Sarkar
322
1
0
21 Feb 2025
Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent
  Baseline
Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent BaselineIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024
Wenjia Meng
Qian Zheng
Long Yang
Yilong Yin
Gang Pan
OffRL
249
0
0
04 May 2024
Deal, or no deal (or who knows)? Forecasting Uncertainty in
  Conversations using Large Language Models
Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Anthony Sicilia
Hyunwoo J. Kim
Khyathi Chandu
Malihe Alikhani
Jack Hessel
198
3
0
05 Feb 2024
Statistically Efficient Variance Reduction with Double Policy Estimation
  for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning
Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning
Hanhan Zhou
Tian-Shing Lan
Vaneet Aggarwal
OffRL
338
4
0
28 Aug 2023
Inference on Optimal Dynamic Policies via Softmax Approximation
Inference on Optimal Dynamic Policies via Softmax Approximation
Qizhao Chen
Morgane Austern
Vasilis Syrgkanis
OffRL
429
5
0
08 Mar 2023
Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders
Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders
David Bruns-Smith
Angela Zhou
OffRL
693
14
0
01 Feb 2023
Value Enhancement of Reinforcement Learning via Efficient and Robust
  Trust Region Optimization
Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region OptimizationJournal of the American Statistical Association (JASA), 2023
C. Shi
Zhengling Qi
Jianing Wang
Fan Zhou
OffRL
226
9
0
05 Jan 2023
Offline Policy Evaluation and Optimization under Confounding
Offline Policy Evaluation and Optimization under ConfoundingInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022
Chinmaya Kausik
Yangyi Lu
Kevin Tan
Maggie Makar
Yixin Wang
Ambuj Tewari
OffRL
424
15
0
29 Nov 2022
Truly Deterministic Policy Optimization
Truly Deterministic Policy OptimizationNeural Information Processing Systems (NeurIPS), 2022
Ehsan Saleh
Saba Ghaffari
Timothy Bretl
Matthew West
OffRL
295
3
0
30 May 2022
Review of Metrics to Measure the Stability, Robustness and Resilience of
  Reinforcement Learning
Review of Metrics to Measure the Stability, Robustness and Resilience of Reinforcement Learning
L. Pullum
443
6
0
22 Mar 2022
Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning
Doubly Robust Distributionally Robust Off-Policy Evaluation and LearningInternational Conference on Machine Learning (ICML), 2022
Nathan Kallus
Xiaojie Mao
Kaiwen Wang
Zhengyuan Zhou
OODOffRL
366
37
0
19 Feb 2022
Optimal Estimation of Off-Policy Policy Gradient via Double Fitted
  Iteration
Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration
Chengzhuo Ni
Ruiqi Zhang
Xiang Ji
Xuezhou Zhang
Mengdi Wang
OffRL
397
1
0
31 Jan 2022
On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function
  Estimation in Off-policy Evaluation
On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy EvaluationInternational Conference on Machine Learning (ICML), 2022
Xiaohong Chen
Zhengling Qi
OffRL
490
36
0
17 Jan 2022
Projected State-action Balancing Weights for Offline Reinforcement
  Learning
Projected State-action Balancing Weights for Offline Reinforcement LearningAnnals of Statistics (Ann. Stat.), 2021
Jiayi Wang
Zhengling Qi
Raymond K. W. Wong
OffRL
264
24
0
10 Sep 2021
A Unified Off-Policy Evaluation Approach for General Value Function
A Unified Off-Policy Evaluation Approach for General Value Function
Tengyu Xu
Zhuoran Yang
Zhaoran Wang
Yingbin Liang
OffRL
208
2
0
06 Jul 2021
Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality
Doubly Robust Off-Policy Actor-Critic: Convergence and OptimalityInternational Conference on Machine Learning (ICML), 2021
Tengyu Xu
Zhuoran Yang
Zhaoran Wang
Yingbin Liang
OffRL
373
30
0
23 Feb 2021
Fast Rates for the Regret of Offline Reinforcement Learning
Fast Rates for the Regret of Offline Reinforcement LearningAnnual Conference Computational Learning Theory (COLT), 2021
Yichun Hu
Nathan Kallus
Masatoshi Uehara
OffRL
473
34
0
31 Jan 2021
Optimal Off-Policy Evaluation from Multiple Logging Policies
Optimal Off-Policy Evaluation from Multiple Logging Policies
Nathan Kallus
Yuta Saito
Masatoshi Uehara
OffRL
347
44
0
21 Oct 2020
Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic
  Policies
Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies
Nathan Kallus
Masatoshi Uehara
OffRL
178
16
0
06 Jun 2020
Efficient Evaluation of Natural Stochastic Policies in Offline
  Reinforcement Learning
Efficient Evaluation of Natural Stochastic Policies in Offline Reinforcement Learning
Nathan Kallus
Masatoshi Uehara
OffRL
223
11
0
06 Jun 2020
Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with
  Double Reinforcement Learning
Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement LearningOperational Research (OR), 2019
Nathan Kallus
Masatoshi Uehara
OffRL
447
106
0
12 Sep 2019
Global Optimality Guarantees For Policy Gradient Methods
Global Optimality Guarantees For Policy Gradient MethodsOperational Research (OR), 2019
Jalaj Bhandari
Daniel Russo
595
224
0
05 Jun 2019
Learning When-to-Treat Policies
Learning When-to-Treat PoliciesJournal of the American Statistical Association (JASA), 2019
Xinkun Nie
Emma Brunskill
Stefan Wager
CMLOffRL
296
98
0
23 May 2019
Relative Importance Sampling For Off-Policy Actor-Critic in Deep
  Reinforcement Learning
Relative Importance Sampling For Off-Policy Actor-Critic in Deep Reinforcement Learning
Mahammad Humayoo
Xueqi Cheng
BDLOffRL
340
8
0
30 Oct 2018
1
Page 1 of 1