Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2002.04014
Cited By
v1
v2 (latest)
Statistically Efficient Off-Policy Policy Gradients
International Conference on Machine Learning (ICML), 2020
10 February 2020
Nathan Kallus
Masatoshi Uehara
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Statistically Efficient Off-Policy Policy Gradients"
27 / 27 papers shown
ExGRPO: Learning to Reason from Experience
Runzhe Zhan
Yafu Li
Zhi Wang
Xiaoye Qu
Dongrui Liu
Jing Shao
Derek F. Wong
Yu Cheng
OffRL
LRM
193
11
1
02 Oct 2025
Doubly Robust Alignment for Large Language Models
Erhan Xu
Kai Ye
Hongyi Zhou
Luhan Zhu
Francesco Quinzan
Chengchun Shi
353
7
0
01 Jun 2025
Reinforcement Learning with Continuous Actions Under Unmeasured Confounding
Yuhan Li
Eugene Han
Yifan Hu
Wenzhuo Zhou
Zhengling Qi
Yifan Cui
Ruoqing Zhu
OffRL
983
1
0
01 May 2025
Enhancing PPO with Trajectory-Aware Hybrid Policies
Qisai Liu
Zhanhong Jiang
Hsin-Jung Yang
Mahsa Khosravi
Joshua R. Waite
Soumik Sarkar
322
1
0
21 Feb 2025
Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024
Wenjia Meng
Qian Zheng
Long Yang
Yilong Yin
Gang Pan
OffRL
249
0
0
04 May 2024
Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Anthony Sicilia
Hyunwoo J. Kim
Khyathi Chandu
Malihe Alikhani
Jack Hessel
198
3
0
05 Feb 2024
Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning
Hanhan Zhou
Tian-Shing Lan
Vaneet Aggarwal
OffRL
338
4
0
28 Aug 2023
Inference on Optimal Dynamic Policies via Softmax Approximation
Qizhao Chen
Morgane Austern
Vasilis Syrgkanis
OffRL
429
5
0
08 Mar 2023
Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders
David Bruns-Smith
Angela Zhou
OffRL
693
14
0
01 Feb 2023
Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization
Journal of the American Statistical Association (JASA), 2023
C. Shi
Zhengling Qi
Jianing Wang
Fan Zhou
OffRL
226
9
0
05 Jan 2023
Offline Policy Evaluation and Optimization under Confounding
International Conference on Artificial Intelligence and Statistics (AISTATS), 2022
Chinmaya Kausik
Yangyi Lu
Kevin Tan
Maggie Makar
Yixin Wang
Ambuj Tewari
OffRL
424
15
0
29 Nov 2022
Truly Deterministic Policy Optimization
Neural Information Processing Systems (NeurIPS), 2022
Ehsan Saleh
Saba Ghaffari
Timothy Bretl
Matthew West
OffRL
295
3
0
30 May 2022
Review of Metrics to Measure the Stability, Robustness and Resilience of Reinforcement Learning
L. Pullum
443
6
0
22 Mar 2022
Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning
International Conference on Machine Learning (ICML), 2022
Nathan Kallus
Xiaojie Mao
Kaiwen Wang
Zhengyuan Zhou
OOD
OffRL
366
37
0
19 Feb 2022
Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration
Chengzhuo Ni
Ruiqi Zhang
Xiang Ji
Xuezhou Zhang
Mengdi Wang
OffRL
397
1
0
31 Jan 2022
On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation
International Conference on Machine Learning (ICML), 2022
Xiaohong Chen
Zhengling Qi
OffRL
490
36
0
17 Jan 2022
Projected State-action Balancing Weights for Offline Reinforcement Learning
Annals of Statistics (Ann. Stat.), 2021
Jiayi Wang
Zhengling Qi
Raymond K. W. Wong
OffRL
264
24
0
10 Sep 2021
A Unified Off-Policy Evaluation Approach for General Value Function
Tengyu Xu
Zhuoran Yang
Zhaoran Wang
Yingbin Liang
OffRL
208
2
0
06 Jul 2021
Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality
International Conference on Machine Learning (ICML), 2021
Tengyu Xu
Zhuoran Yang
Zhaoran Wang
Yingbin Liang
OffRL
373
30
0
23 Feb 2021
Fast Rates for the Regret of Offline Reinforcement Learning
Annual Conference Computational Learning Theory (COLT), 2021
Yichun Hu
Nathan Kallus
Masatoshi Uehara
OffRL
473
34
0
31 Jan 2021
Optimal Off-Policy Evaluation from Multiple Logging Policies
Nathan Kallus
Yuta Saito
Masatoshi Uehara
OffRL
347
44
0
21 Oct 2020
Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies
Nathan Kallus
Masatoshi Uehara
OffRL
178
16
0
06 Jun 2020
Efficient Evaluation of Natural Stochastic Policies in Offline Reinforcement Learning
Nathan Kallus
Masatoshi Uehara
OffRL
223
11
0
06 Jun 2020
Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning
Operational Research (OR), 2019
Nathan Kallus
Masatoshi Uehara
OffRL
447
106
0
12 Sep 2019
Global Optimality Guarantees For Policy Gradient Methods
Operational Research (OR), 2019
Jalaj Bhandari
Daniel Russo
595
224
0
05 Jun 2019
Learning When-to-Treat Policies
Journal of the American Statistical Association (JASA), 2019
Xinkun Nie
Emma Brunskill
Stefan Wager
CML
OffRL
296
98
0
23 May 2019
Relative Importance Sampling For Off-Policy Actor-Critic in Deep Reinforcement Learning
Mahammad Humayoo
Xueqi Cheng
BDL
OffRL
340
8
0
30 Oct 2018
1
Page 1 of 1