Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1205.4839
Cited By
v1
v2
v3
v4
v5 (latest)
Off-Policy Actor-Critic
International Conference on Machine Learning (ICML), 2012
22 May 2012
T. Degris
Martha White
R. Sutton
OffRL
CML
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Off-Policy Actor-Critic"
50 / 117 papers shown
The Actor-Critic Update Order Matters for PPO in Federated Reinforcement Learning
Zhijie Xie
Shenghui Song
282
0
0
02 Jun 2025
Multi-agent Off-policy Actor-Critic Reinforcement Learning for Partially Observable Environments
Ainur Zhaikhan
Ali H. Sayed
OffRL
278
1
0
06 Jul 2024
Distillation Policy Optimization
Jianfei Ma
OffRL
620
1
0
01 Feb 2023
Reinforcement Learning with Large Action Spaces for Neural Machine Translation
International Conference on Computational Linguistics (COLING), 2022
Asaf Yehudai
Leshem Choshen
Lior Fox
Omri Abend
313
7
0
06 Oct 2022
Improved Policy Optimization for Online Imitation Learning
J. Lavington
Sharan Vaswani
Mark Schmidt
OffRL
325
7
0
29 Jul 2022
Interactive Imitation Learning in Robotics based on Simulations
Xinyi Liu
300
2
0
26 Jul 2022
Continual Meta-Reinforcement Learning for UAV-Aided Vehicular Wireless Networks
Riccardo Marini
Sangwoo Park
Osvaldo Simeone
C. Buratti
366
11
0
13 Jul 2022
Efficient Distributed Framework for Collaborative Multi-Agent Reinforcement Learning
Shuhan Qi
Shuhao Zhang
Xiaohan Hou
Jia-jia Zhang
Xinyu Wang
Jing Xiao
227
0
0
11 May 2022
Variance Reduction based Partial Trajectory Reuse to Accelerate Policy Gradient Optimization
Hua Zheng
Wei Xie
344
2
0
06 May 2022
TASAC: a twin-actor reinforcement learning framework with stochastic policy for batch process control
Tanuja Joshi
H. Kodamana
Harikumar Kandath
N. Kaisare
OffRL
123
0
0
22 Apr 2022
Remember and Forget Experience Replay for Multi-Agent Reinforcement Learning
Pascal Weber
Daniel Wälchli
Mustafa Zeqiri
Petros Koumoutsakos
CLL
OffRL
254
9
0
24 Mar 2022
Residual Robot Learning for Object-Centric Probabilistic Movement Primitives
João Carvalho
Dorothea Koert
Marek Daniv
Jan Peters
258
12
0
08 Mar 2022
A Temporal-Difference Approach to Policy Gradient Estimation
International Conference on Machine Learning (ICML), 2022
Samuele Tosatto
Andrew Patterson
Martha White
A. R. Mahmood
OffRL
522
3
0
04 Feb 2022
Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration
Chengzhuo Ni
Ruiqi Zhang
Xiang Ji
Xuezhou Zhang
Mengdi Wang
OffRL
423
1
0
31 Jan 2022
GCS: Graph-based Coordination Strategy for Multi-Agent Reinforcement Learning
Adaptive Agents and Multi-Agent Systems (AAMAS), 2022
Jingqing Ruan
Yali Du
Xuantang Xiong
Dengpeng Xing
Xiyun Li
Linghui Meng
Haifeng Zhang
Jun Wang
Bo Xu
184
39
0
17 Jan 2022
An Analytical Update Rule for General Policy Optimization
Hepeng Li
Nicholas Clavette
Haibo He
280
5
0
03 Dec 2021
Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning
Nicolai Dorka
Tim Welschehold
Joschka Boedecker
Wolfram Burgard
OffRL
267
14
0
24 Nov 2021
Off-policy Reinforcement Learning with Optimistic Exploration and Distribution Correction
A. Ahmad
Shuo Cheng
D. Saraswat
Aly El Gamal
Wenjie Wang
Gurmukh Johal
OffRL
OnRL
216
1
0
22 Oct 2021
Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm
Raghuram Bharadwaj Diddigi
Prateek Jain
P. J
S. Bhatnagar
CML
OffRL
351
3
0
19 Oct 2021
Offline Reinforcement Learning with Soft Behavior Regularization
Haoran Xu
Xianyuan Zhan
Jianxiong Li
Honglei Yin
OffRL
173
34
0
14 Oct 2021
Learning Natural Language Generation from Scratch
Alice Martin Donati
Guillaume Quispe
Charles Ollion
Sylvain Le Corff
Florian Strub
Olivier Pietquin
LRM
189
4
0
20 Sep 2021
Deep Reinforcement Learning for Equal Risk Pricing and Hedging under Dynamic Expectile Risk Measures
S. Marzban
Erick Delage
Jonathan Yu-Meng Li
146
4
0
09 Sep 2021
Implicitly Regularized RL with Implicit Q-Values
Nino Vieillard
Marcin Andrychowicz
Anton Raichuk
Olivier Pietquin
Matthieu Geist
OffRL
231
9
0
16 Aug 2021
Optimal Actor-Critic Policy with Optimized Training Datasets
C. Banerjee
Zhiyong Chen
N. Noman
M. Zamani
OffRL
308
10
0
16 Aug 2021
Off-Policy Reinforcement Learning with Delayed Rewards
International Conference on Machine Learning (ICML), 2021
Beining Han
Zhizhou Ren
Zuofan Wu
Yuanshuo Zhou
Jian-wei Peng
OffRL
205
46
0
22 Jun 2021
On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction
International Conference on Artificial Intelligence and Statistics (AISTATS), 2021
Jiawei Huang
Nan Jiang
353
6
0
02 Jun 2021
Learning to Optimize Industry-Scale Dynamic Pickup and Delivery Problems
IEEE International Conference on Data Engineering (ICDE), 2021
Xijun Li
Weilin Luo
Mingxuan Yuan
Jun Wang
Jiawen Lu
Jie Wang
Jinhu Lu
Jia Zeng
214
51
0
27 May 2021
Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation
IEEE Control Systems Letters (L-CSS), 2021
Zaiwei Chen
S. Khodadadian
S. T. Maguluri
OffRL
297
33
0
26 May 2021
Unbiased Asymmetric Reinforcement Learning under Partial Observability
Adaptive Agents and Multi-Agent Systems (AAMAS), 2021
Andrea Baisero
Chris Amato
OffRL
255
27
0
25 May 2021
Towards a Sample Efficient Reinforcement Learning Pipeline for Vision Based Robotics
Maxence Mahe
Pierre Belamri
Jesús Bujalance Martín
206
0
0
20 May 2021
Behavior-Guided Actor-Critic: Improving Exploration via Learning Policy Behavior Representation for Deep Reinforcement Learning
Ammar Fayad
M. Ibrahim
BDL
172
3
0
09 Apr 2021
NQMIX: Non-monotonic Value Function Factorization for Deep Multi-Agent Reinforcement Learning
Quanlin Chen
OffRL
285
0
0
05 Apr 2021
Joint Resource Management for MC-NOMA: A Deep Reinforcement Learning Approach
IEEE Transactions on Wireless Communications (IEEE TWC), 2021
Shaoyang Wang
Tiejun Lv
Wei Ni
N. Beaulieu
Y. Guo
117
55
0
29 Mar 2021
Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm
International Conference on Machine Learning (ICML), 2021
S. Khodadadian
Zaiwei Chen
S. T. Maguluri
CML
OffRL
361
33
0
18 Feb 2021
Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint
Nithia Vijayan
A. PrashanthL.
OffRL
396
7
0
06 Jan 2021
Adaptable Automation with Modular Deep Reinforcement Learning and Policy Transfer
Engineering applications of artificial intelligence (EAAI), 2020
Zohreh Raziei
Mohsen Moghaddam
192
29
0
27 Nov 2020
What About Inputing Policy in Value Function: Policy Representation and Policy-extended Value Function Approximator
Hongyao Tang
Zhaopeng Meng
Jianye Hao
Chong Chen
D. Graves
...
Hangyu Mao
Wulong Liu
Yaodong Yang
Wenyuan Tao
Li Wang
OffRL
323
6
0
19 Oct 2020
Human-centric Dialog Training via Offline Reinforcement Learning
Natasha Jaques
J. Shen
Asma Ghandeharioun
Craig Ferguson
Àgata Lapedriza
Noah J. Jones
S. Gu
Rosalind W. Picard
OffRL
268
115
0
12 Oct 2020
Neural Mask Generator: Learning to Generate Adaptive Word Maskings for Language Model Adaptation
Minki Kang
Moonsu Han
Sung Ju Hwang
OOD
295
18
0
06 Oct 2020
Learning from eXtreme Bandit Feedback
AAAI Conference on Artificial Intelligence (AAAI), 2020
Romain Lopez
Inderjit S. Dhillon
Sai Li
OffRL
259
26
0
27 Sep 2020
Reinforcement Learning for Strategic Recommendations
Georgios Theocharous
Yash Chandak
Philip S. Thomas
F. D. Nijs
OffRL
250
13
0
15 Sep 2020
Variance-Reduced Off-Policy Memory-Efficient Policy Search
Daoming Lyu
Qi Qi
Mohammad Ghavamzadeh
Hengshuai Yao
Tianbao Yang
Bo Liu
OffRL
221
7
0
14 Sep 2020
Forward and inverse reinforcement learning sharing network weights and hyperparameters
E. Uchibe
Kenji Doya
193
22
0
17 Aug 2020
Off-Policy Multi-Agent Decomposed Policy Gradients
International Conference on Learning Representations (ICLR), 2020
Yihan Wang
Beining Han
Tonghan Wang
Heng Dong
Chongjie Zhang
281
206
0
24 Jul 2020
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL
Seyed Kamyar Seyed Ghasemipour
Dale Schuurmans
S. Gu
OffRL
592
137
0
21 Jul 2020
Meta-Gradient Reinforcement Learning with an Objective Discovered Online
Neural Information Processing Systems (NeurIPS), 2020
Zhongwen Xu
H. V. Hasselt
Matteo Hessel
Junhyuk Oh
Satinder Singh
David Silver
353
89
0
16 Jul 2020
Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep Reinforcement Learning
Sabrina Hoppe
Marc Toussaint
OffRL
216
7
0
15 Jul 2020
Deep reinforcement learning driven inspection and maintenance planning under incomplete information and constraints
C. Andriotis
K. Papakonstantinou
211
120
0
02 Jul 2020
Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization
Neural Information Processing Systems (NeurIPS), 2020
Paul Barde
Julien Roy
Wonseok Jeon
Joelle Pineau
C. Pal
Derek Nowrouzezahrai
318
29
0
23 Jun 2020
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets
Ashvin Nair
Abhishek Gupta
Murtaza Dalal
Sergey Levine
OffRL
OnRL
920
774
0
16 Jun 2020
1
2
3
Next
Page 1 of 3