ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08473
  4. Cited By
Off-Policy Policy Gradient with State Distribution Correction
v1v2 (latest)

Off-Policy Policy Gradient with State Distribution Correction

17 April 2019
Yao Liu
Adith Swaminathan
Alekh Agarwal
Emma Brunskill
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Off-Policy Policy Gradient with State Distribution Correction"

50 / 55 papers shown
Online Optimization for Offline Safe Reinforcement Learning
Online Optimization for Offline Safe Reinforcement Learning
Yassine Chemingui
Aryan Deshwal
Alan Fern
Thanh Nguyen-Tang
J. Doppa
OffRL
179
0
0
24 Oct 2025
On The Statistical Complexity of Offline Decision-Making
On The Statistical Complexity of Offline Decision-MakingInternational Conference on Machine Learning (ICML), 2025
Thanh Nguyen-Tang
R. Arora
OffRL
544
2
0
10 Jan 2025
On Sample-Efficient Offline Reinforcement Learning: Data Diversity,
  Posterior Sampling, and Beyond
On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling, and Beyond
Thanh Nguyen-Tang
Raman Arora
OffRL
381
5
0
06 Jan 2024
Reward Dropout Improves Control: Bi-objective Perspective on Reinforced
  LM
Reward Dropout Improves Control: Bi-objective Perspective on Reinforced LM
Changhun Lee
Chiehyeon Lim
337
0
0
06 Oct 2023
$\mathcal{B}$-Coder: Value-Based Deep Reinforcement Learning for Program
  Synthesis
B\mathcal{B}B-Coder: Value-Based Deep Reinforcement Learning for Program SynthesisInternational Conference on Learning Representations (ICLR), 2023
Zishun Yu
Yunzhe Tao
Liyu Chen
Tao Sun
Hongxia Yang
359
20
0
04 Oct 2023
A General Offline Reinforcement Learning Framework for Interactive
  Recommendation
A General Offline Reinforcement Learning Framework for Interactive RecommendationAAAI Conference on Artificial Intelligence (AAAI), 2021
Teng Xiao
Xuetao Zhang
OffRL
316
82
0
01 Oct 2023
Budgeting Counterfactual for Offline RL
Budgeting Counterfactual for Offline RLNeural Information Processing Systems (NeurIPS), 2023
Yao Liu
Pratik Chaudhari
Rasool Fakoor
OffRL
368
4
0
12 Jul 2023
Reinforcement Learning Tutor Better Supported Lower Performers in a Math
  Task
Reinforcement Learning Tutor Better Supported Lower Performers in a Math TaskMachine-mediated learning (ML), 2023
S. Ruan
Allen Nie
William Steenbergen
Jiayu He
JQ Zhang
...
Kyle Dang Nguyen
Catherine Y Wang
Rui Ying
James A. Landay
Emma Brunskill
271
33
0
11 Apr 2023
Adversarial Model for Offline Reinforcement Learning
Adversarial Model for Offline Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023
M. Bhardwaj
Tengyang Xie
Byron Boots
Nan Jiang
Ching-An Cheng
AAMLOffRL
340
38
0
21 Feb 2023
Coordinate Ascent for Off-Policy RL with Global Convergence Guarantees
Coordinate Ascent for Off-Policy RL with Global Convergence GuaranteesInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022
Hsin-En Su
Yen-Ju Chen
Ping-Chun Hsieh
Xi Liu
OffRL
282
1
0
10 Dec 2022
Behavior Estimation from Multi-Source Data for Offline Reinforcement
  Learning
Behavior Estimation from Multi-Source Data for Offline Reinforcement LearningAAAI Conference on Artificial Intelligence (AAAI), 2022
Guoxi Zhang
H. Kashima
OffRL
242
2
0
29 Nov 2022
On Instance-Dependent Bounds for Offline Reinforcement Learning with
  Linear Function Approximation
On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function ApproximationAAAI Conference on Artificial Intelligence (AAAI), 2022
Thanh Nguyen-Tang
Ming Yin
Sunil R. Gupta
Svetha Venkatesh
R. Arora
OffRL
240
24
0
23 Nov 2022
Data-Efficient Pipeline for Offline Reinforcement Learning with Limited
  Data
Data-Efficient Pipeline for Offline Reinforcement Learning with Limited DataNeural Information Processing Systems (NeurIPS), 2022
Allen Nie
Yannis Flet-Berliac
Deon R. Jordan
William Steenbergen
Emma Brunskill
OffRL
353
14
0
16 Oct 2022
Offline Policy Optimization with Eligible Actions
Offline Policy Optimization with Eligible ActionsConference on Uncertainty in Artificial Intelligence (UAI), 2022
Yao Liu
Yannis Flet-Berliac
Emma Brunskill
OffRL
195
6
0
01 Jul 2022
Offline Stochastic Shortest Path: Learning, Evaluation and Towards
  Optimality
Offline Stochastic Shortest Path: Learning, Evaluation and Towards OptimalityConference on Uncertainty in Artificial Intelligence (UAI), 2022
Ming Yin
Wenjing Chen
Mengdi Wang
Yu Wang
OffRL
223
6
0
10 Jun 2022
Variance Reduction based Partial Trajectory Reuse to Accelerate Policy
  Gradient Optimization
Variance Reduction based Partial Trajectory Reuse to Accelerate Policy Gradient Optimization
Hua Zheng
Wei Xie
345
2
0
06 May 2022
When Should We Prefer Offline Reinforcement Learning Over Behavioral
  Cloning?
When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?
Aviral Kumar
Joey Hong
Anika Singh
Sergey Levine
OffRL
361
100
0
12 Apr 2022
Continual Auxiliary Task Learning
Continual Auxiliary Task LearningNeural Information Processing Systems (NeurIPS), 2022
Matt McLeod
Chun-Ping Lo
M. Schlegel
Andrew Jacobsen
Raksha Kumaraswamy
Martha White
Adam White
CLL
187
11
0
22 Feb 2022
Off-Policy Evaluation for Large Action Spaces via Embeddings
Off-Policy Evaluation for Large Action Spaces via EmbeddingsInternational Conference on Machine Learning (ICML), 2022
Yuta Saito
Thorsten Joachims
OffRL
303
60
0
13 Feb 2022
Model-Based Offline Meta-Reinforcement Learning with Regularization
Model-Based Offline Meta-Reinforcement Learning with RegularizationInternational Conference on Learning Representations (ICLR), 2022
Sen Lin
Jialin Wan
Tengyu Xu
Yingbin Liang
Junshan Zhang
OffRL
434
20
0
07 Feb 2022
A Temporal-Difference Approach to Policy Gradient Estimation
A Temporal-Difference Approach to Policy Gradient EstimationInternational Conference on Machine Learning (ICML), 2022
Samuele Tosatto
Andrew Patterson
Martha White
A. R. Mahmood
OffRL
522
3
0
04 Feb 2022
Optimal Estimation of Off-Policy Policy Gradient via Double Fitted
  Iteration
Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration
Chengzhuo Ni
Ruiqi Zhang
Xiang Ji
Xuezhou Zhang
Mengdi Wang
OffRL
424
1
0
31 Jan 2022
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution MismatchJournal of machine learning research (JMLR), 2021
Shangtong Zhang
Rémi Tachet des Combes
Romain Laroche
519
18
0
04 Nov 2021
Towards Hyperparameter-free Policy Selection for Offline Reinforcement
  Learning
Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning
Siyuan Zhang
Nan Jiang
OffRL
401
43
0
26 Oct 2021
Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm
Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm
Raghuram Bharadwaj Diddigi
Prateek Jain
P. J
S. Bhatnagar
CMLOffRL
351
3
0
19 Oct 2021
Towards Instance-Optimal Offline Reinforcement Learning with Pessimism
Towards Instance-Optimal Offline Reinforcement Learning with Pessimism
Ming Yin
Yu Wang
OffRL
344
88
0
17 Oct 2021
Offline Reinforcement Learning with Reverse Model-based Imagination
Offline Reinforcement Learning with Reverse Model-based Imagination
Jianhao Wang
Wenzhe Li
Haozhe Jiang
Guangxiang Zhu
Siyuan Li
Chongjie Zhang
OffRL
537
72
0
01 Oct 2021
Dr Jekyll and Mr Hyde: the Strange Case of Off-Policy Policy Updates
Dr Jekyll and Mr Hyde: the Strange Case of Off-Policy Policy Updates
Romain Laroche
Rémi Tachet des Combes
230
7
0
29 Sep 2021
Mean-Field Multi-Agent Reinforcement Learning: A Decentralized Network
  Approach
Mean-Field Multi-Agent Reinforcement Learning: A Decentralized Network ApproachMathematics of Operations Research (MOR), 2021
Haotian Gu
Xin Guo
Xiaoli Wei
Renyuan Xu
OOD
332
49
0
05 Aug 2021
Learning Expected Emphatic Traces for Deep RL
Learning Expected Emphatic Traces for Deep RL
Ray Jiang
Shangtong Zhang
Veronica Chelu
Adam White
Hado van Hasselt
OffRL
320
13
0
12 Jul 2021
The Curse of Passive Data Collection in Batch Reinforcement Learning
The Curse of Passive Data Collection in Batch Reinforcement Learning
Chenjun Xiao
Ilbin Lee
Bo Dai
Dale Schuurmans
Csaba Szepesvári
OffRL
281
1
0
18 Jun 2021
Characterizing the Gap Between Actor-Critic and Policy Gradient
Characterizing the Gap Between Actor-Critic and Policy GradientInternational Conference on Machine Learning (ICML), 2021
Junfeng Wen
Saurabh Kumar
Ramki Gummadi
Dale Schuurmans
222
18
0
13 Jun 2021
On the Convergence Rate of Off-Policy Policy Optimization Methods with
  Density-Ratio Correction
On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio CorrectionInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2021
Jiawei Huang
Nan Jiang
353
6
0
02 Jun 2021
Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear
  Function Approximation
Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function ApproximationIEEE Control Systems Letters (L-CSS), 2021
Zaiwei Chen
S. Khodadadian
S. T. Maguluri
OffRL
297
33
0
26 May 2021
Nearly Horizon-Free Offline Reinforcement Learning
Nearly Horizon-Free Offline Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2021
Zhaolin Ren
Jialian Li
Bo Dai
S. Du
Sujay Sanghavi
OffRL
381
51
0
25 Mar 2021
On the Convergence and Optimality of Policy Gradient for Markov Coherent
  Risk
On the Convergence and Optimality of Policy Gradient for Markov Coherent Risk
Audrey Huang
Liu Leqi
Zachary Chase Lipton
Kamyar Azizzadenesheli
270
23
0
04 Mar 2021
Harnessing Distribution Ratio Estimators for Learning Agents with
  Quality and Diversity
Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity
Tanmay Gangwani
Jian Peng
Yuanshuo Zhou
229
12
0
05 Nov 2020
Batch Reinforcement Learning with a Nonparametric Off-Policy Policy
  Gradient
Batch Reinforcement Learning with a Nonparametric Off-Policy Policy GradientIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Samuele Tosatto
João Carvalho
Jan Peters
OffRL
303
8
0
27 Oct 2020
Batch Value-function Approximation with Only Realizability
Batch Value-function Approximation with Only RealizabilityInternational Conference on Machine Learning (ICML), 2020
Tengyang Xie
Nan Jiang
OffRL
750
131
0
11 Aug 2020
Batch Policy Learning in Average Reward Markov Decision Processes
Batch Policy Learning in Average Reward Markov Decision ProcessesAnnals of Statistics (Ann. Stat.), 2020
Peng Liao
Zhengling Qi
Runzhe Wan
P. Klasnja
Susan Murphy
OffRL
393
95
0
23 Jul 2020
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline
  and Online RL
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL
Seyed Kamyar Seyed Ghasemipour
Dale Schuurmans
S. Gu
OffRL
592
137
0
21 Jul 2020
Off-policy Bandits with Deficient Support
Off-policy Bandits with Deficient Support
Noveen Sachdeva
Yi-Hsun Su
Thorsten Joachims
OffRL
532
84
0
16 Jun 2020
Parameter-Based Value Functions
Parameter-Based Value Functions
Francesco Faccio
Louis Kirsch
Jürgen Schmidhuber
OffRL
390
29
0
16 Jun 2020
A Survey of Deep Learning for Scientific Discovery
A Survey of Deep Learning for Scientific Discovery
M. Raghu
Erica Schmidt
OODAI4CE
452
151
0
26 Mar 2020
Black-box Off-policy Estimation for Infinite-Horizon Reinforcement
  Learning
Black-box Off-policy Estimation for Infinite-Horizon Reinforcement LearningInternational Conference on Learning Representations (ICLR), 2020
Ali Mousavi
Lihong Li
Qiang Liu
Denny Zhou
OffRL
364
33
0
24 Mar 2020
Optimizing Medical Treatment for Sepsis in Intensive Care: from
  Reinforcement Learning to Pre-Trial Evaluation
Optimizing Medical Treatment for Sepsis in Intensive Care: from Reinforcement Learning to Pre-Trial Evaluation
Luchen Li
I. Albert-Smet
Aldo A. Faisal
OffRL
196
12
0
13 Mar 2020
Off-policy Policy Evaluation For Sequential Decisions Under Unobserved
  Confounding
Off-policy Policy Evaluation For Sequential Decisions Under Unobserved ConfoundingNeural Information Processing Systems (NeurIPS), 2020
Hongseok Namkoong
Ramtin Keramati
Steve Yadlowsky
Emma Brunskill
OffRL
424
72
0
12 Mar 2020
Minimax Value Interval for Off-Policy Evaluation and Policy Optimization
Minimax Value Interval for Off-Policy Evaluation and Policy Optimization
Nan Jiang
Jiawei Huang
OffRL
534
17
0
06 Feb 2020
Sublinear Optimal Policy Value Estimation in Contextual Bandits
Sublinear Optimal Policy Value Estimation in Contextual BanditsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2019
Weihao Kong
Gregory Valiant
Emma Brunskill
OffRL
219
14
0
12 Dec 2019
AlgaeDICE: Policy Gradient from Arbitrary Experience
AlgaeDICE: Policy Gradient from Arbitrary Experience
Ofir Nachum
Bo Dai
Ilya Kostrikov
Yinlam Chow
Lihong Li
Dale Schuurmans
OffRL
370
261
0
04 Dec 2019
12
Next
Page 1 of 2