Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1904.08473
Cited By
v1
v2 (latest)
Off-Policy Policy Gradient with State Distribution Correction
17 April 2019
Yao Liu
Adith Swaminathan
Alekh Agarwal
Emma Brunskill
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Off-Policy Policy Gradient with State Distribution Correction"
50 / 55 papers shown
Online Optimization for Offline Safe Reinforcement Learning
Yassine Chemingui
Aryan Deshwal
Alan Fern
Thanh Nguyen-Tang
J. Doppa
OffRL
179
0
0
24 Oct 2025
On The Statistical Complexity of Offline Decision-Making
International Conference on Machine Learning (ICML), 2025
Thanh Nguyen-Tang
R. Arora
OffRL
544
2
0
10 Jan 2025
On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling, and Beyond
Thanh Nguyen-Tang
Raman Arora
OffRL
381
5
0
06 Jan 2024
Reward Dropout Improves Control: Bi-objective Perspective on Reinforced LM
Changhun Lee
Chiehyeon Lim
337
0
0
06 Oct 2023
B
\mathcal{B}
B
-Coder: Value-Based Deep Reinforcement Learning for Program Synthesis
International Conference on Learning Representations (ICLR), 2023
Zishun Yu
Yunzhe Tao
Liyu Chen
Tao Sun
Hongxia Yang
359
20
0
04 Oct 2023
A General Offline Reinforcement Learning Framework for Interactive Recommendation
AAAI Conference on Artificial Intelligence (AAAI), 2021
Teng Xiao
Xuetao Zhang
OffRL
316
82
0
01 Oct 2023
Budgeting Counterfactual for Offline RL
Neural Information Processing Systems (NeurIPS), 2023
Yao Liu
Pratik Chaudhari
Rasool Fakoor
OffRL
368
4
0
12 Jul 2023
Reinforcement Learning Tutor Better Supported Lower Performers in a Math Task
Machine-mediated learning (ML), 2023
S. Ruan
Allen Nie
William Steenbergen
Jiayu He
JQ Zhang
...
Kyle Dang Nguyen
Catherine Y Wang
Rui Ying
James A. Landay
Emma Brunskill
271
33
0
11 Apr 2023
Adversarial Model for Offline Reinforcement Learning
Neural Information Processing Systems (NeurIPS), 2023
M. Bhardwaj
Tengyang Xie
Byron Boots
Nan Jiang
Ching-An Cheng
AAML
OffRL
340
38
0
21 Feb 2023
Coordinate Ascent for Off-Policy RL with Global Convergence Guarantees
International Conference on Artificial Intelligence and Statistics (AISTATS), 2022
Hsin-En Su
Yen-Ju Chen
Ping-Chun Hsieh
Xi Liu
OffRL
282
1
0
10 Dec 2022
Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning
AAAI Conference on Artificial Intelligence (AAAI), 2022
Guoxi Zhang
H. Kashima
OffRL
242
2
0
29 Nov 2022
On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation
AAAI Conference on Artificial Intelligence (AAAI), 2022
Thanh Nguyen-Tang
Ming Yin
Sunil R. Gupta
Svetha Venkatesh
R. Arora
OffRL
240
24
0
23 Nov 2022
Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data
Neural Information Processing Systems (NeurIPS), 2022
Allen Nie
Yannis Flet-Berliac
Deon R. Jordan
William Steenbergen
Emma Brunskill
OffRL
353
14
0
16 Oct 2022
Offline Policy Optimization with Eligible Actions
Conference on Uncertainty in Artificial Intelligence (UAI), 2022
Yao Liu
Yannis Flet-Berliac
Emma Brunskill
OffRL
195
6
0
01 Jul 2022
Offline Stochastic Shortest Path: Learning, Evaluation and Towards Optimality
Conference on Uncertainty in Artificial Intelligence (UAI), 2022
Ming Yin
Wenjing Chen
Mengdi Wang
Yu Wang
OffRL
223
6
0
10 Jun 2022
Variance Reduction based Partial Trajectory Reuse to Accelerate Policy Gradient Optimization
Hua Zheng
Wei Xie
345
2
0
06 May 2022
When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?
Aviral Kumar
Joey Hong
Anika Singh
Sergey Levine
OffRL
361
100
0
12 Apr 2022
Continual Auxiliary Task Learning
Neural Information Processing Systems (NeurIPS), 2022
Matt McLeod
Chun-Ping Lo
M. Schlegel
Andrew Jacobsen
Raksha Kumaraswamy
Martha White
Adam White
CLL
187
11
0
22 Feb 2022
Off-Policy Evaluation for Large Action Spaces via Embeddings
International Conference on Machine Learning (ICML), 2022
Yuta Saito
Thorsten Joachims
OffRL
303
60
0
13 Feb 2022
Model-Based Offline Meta-Reinforcement Learning with Regularization
International Conference on Learning Representations (ICLR), 2022
Sen Lin
Jialin Wan
Tengyu Xu
Yingbin Liang
Junshan Zhang
OffRL
434
20
0
07 Feb 2022
A Temporal-Difference Approach to Policy Gradient Estimation
International Conference on Machine Learning (ICML), 2022
Samuele Tosatto
Andrew Patterson
Martha White
A. R. Mahmood
OffRL
522
3
0
04 Feb 2022
Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration
Chengzhuo Ni
Ruiqi Zhang
Xiang Ji
Xuezhou Zhang
Mengdi Wang
OffRL
424
1
0
31 Jan 2022
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch
Journal of machine learning research (JMLR), 2021
Shangtong Zhang
Rémi Tachet des Combes
Romain Laroche
519
18
0
04 Nov 2021
Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning
Siyuan Zhang
Nan Jiang
OffRL
401
43
0
26 Oct 2021
Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm
Raghuram Bharadwaj Diddigi
Prateek Jain
P. J
S. Bhatnagar
CML
OffRL
351
3
0
19 Oct 2021
Towards Instance-Optimal Offline Reinforcement Learning with Pessimism
Ming Yin
Yu Wang
OffRL
344
88
0
17 Oct 2021
Offline Reinforcement Learning with Reverse Model-based Imagination
Jianhao Wang
Wenzhe Li
Haozhe Jiang
Guangxiang Zhu
Siyuan Li
Chongjie Zhang
OffRL
537
72
0
01 Oct 2021
Dr Jekyll and Mr Hyde: the Strange Case of Off-Policy Policy Updates
Romain Laroche
Rémi Tachet des Combes
230
7
0
29 Sep 2021
Mean-Field Multi-Agent Reinforcement Learning: A Decentralized Network Approach
Mathematics of Operations Research (MOR), 2021
Haotian Gu
Xin Guo
Xiaoli Wei
Renyuan Xu
OOD
332
49
0
05 Aug 2021
Learning Expected Emphatic Traces for Deep RL
Ray Jiang
Shangtong Zhang
Veronica Chelu
Adam White
Hado van Hasselt
OffRL
320
13
0
12 Jul 2021
The Curse of Passive Data Collection in Batch Reinforcement Learning
Chenjun Xiao
Ilbin Lee
Bo Dai
Dale Schuurmans
Csaba Szepesvári
OffRL
281
1
0
18 Jun 2021
Characterizing the Gap Between Actor-Critic and Policy Gradient
International Conference on Machine Learning (ICML), 2021
Junfeng Wen
Saurabh Kumar
Ramki Gummadi
Dale Schuurmans
222
18
0
13 Jun 2021
On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction
International Conference on Artificial Intelligence and Statistics (AISTATS), 2021
Jiawei Huang
Nan Jiang
353
6
0
02 Jun 2021
Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation
IEEE Control Systems Letters (L-CSS), 2021
Zaiwei Chen
S. Khodadadian
S. T. Maguluri
OffRL
297
33
0
26 May 2021
Nearly Horizon-Free Offline Reinforcement Learning
Neural Information Processing Systems (NeurIPS), 2021
Zhaolin Ren
Jialian Li
Bo Dai
S. Du
Sujay Sanghavi
OffRL
381
51
0
25 Mar 2021
On the Convergence and Optimality of Policy Gradient for Markov Coherent Risk
Audrey Huang
Liu Leqi
Zachary Chase Lipton
Kamyar Azizzadenesheli
270
23
0
04 Mar 2021
Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity
Tanmay Gangwani
Jian Peng
Yuanshuo Zhou
229
12
0
05 Nov 2020
Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Samuele Tosatto
João Carvalho
Jan Peters
OffRL
303
8
0
27 Oct 2020
Batch Value-function Approximation with Only Realizability
International Conference on Machine Learning (ICML), 2020
Tengyang Xie
Nan Jiang
OffRL
750
131
0
11 Aug 2020
Batch Policy Learning in Average Reward Markov Decision Processes
Annals of Statistics (Ann. Stat.), 2020
Peng Liao
Zhengling Qi
Runzhe Wan
P. Klasnja
Susan Murphy
OffRL
393
95
0
23 Jul 2020
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL
Seyed Kamyar Seyed Ghasemipour
Dale Schuurmans
S. Gu
OffRL
592
137
0
21 Jul 2020
Off-policy Bandits with Deficient Support
Noveen Sachdeva
Yi-Hsun Su
Thorsten Joachims
OffRL
532
84
0
16 Jun 2020
Parameter-Based Value Functions
Francesco Faccio
Louis Kirsch
Jürgen Schmidhuber
OffRL
390
29
0
16 Jun 2020
A Survey of Deep Learning for Scientific Discovery
M. Raghu
Erica Schmidt
OOD
AI4CE
452
151
0
26 Mar 2020
Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning
International Conference on Learning Representations (ICLR), 2020
Ali Mousavi
Lihong Li
Qiang Liu
Denny Zhou
OffRL
364
33
0
24 Mar 2020
Optimizing Medical Treatment for Sepsis in Intensive Care: from Reinforcement Learning to Pre-Trial Evaluation
Luchen Li
I. Albert-Smet
Aldo A. Faisal
OffRL
196
12
0
13 Mar 2020
Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding
Neural Information Processing Systems (NeurIPS), 2020
Hongseok Namkoong
Ramtin Keramati
Steve Yadlowsky
Emma Brunskill
OffRL
424
72
0
12 Mar 2020
Minimax Value Interval for Off-Policy Evaluation and Policy Optimization
Nan Jiang
Jiawei Huang
OffRL
534
17
0
06 Feb 2020
Sublinear Optimal Policy Value Estimation in Contextual Bandits
International Conference on Artificial Intelligence and Statistics (AISTATS), 2019
Weihao Kong
Gregory Valiant
Emma Brunskill
OffRL
219
14
0
12 Dec 2019
AlgaeDICE: Policy Gradient from Arbitrary Experience
Ofir Nachum
Bo Dai
Ilya Kostrikov
Yinlam Chow
Lihong Li
Dale Schuurmans
OffRL
370
261
0
04 Dec 2019
1
2
Next
Page 1 of 2