ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1205.4839
  4. Cited By
Off-Policy Actor-Critic
v1v2v3v4v5 (latest)

Off-Policy Actor-Critic

International Conference on Machine Learning (ICML), 2012
22 May 2012
T. Degris
Martha White
R. Sutton
    OffRLCML
ArXiv (abs)PDFHTML

Papers citing "Off-Policy Actor-Critic"

50 / 117 papers shown
The Actor-Critic Update Order Matters for PPO in Federated Reinforcement Learning
The Actor-Critic Update Order Matters for PPO in Federated Reinforcement Learning
Zhijie Xie
Shenghui Song
282
0
0
02 Jun 2025
Multi-agent Off-policy Actor-Critic Reinforcement Learning for Partially
  Observable Environments
Multi-agent Off-policy Actor-Critic Reinforcement Learning for Partially Observable Environments
Ainur Zhaikhan
Ali H. Sayed
OffRL
278
1
0
06 Jul 2024
Distillation Policy Optimization
Distillation Policy Optimization
Jianfei Ma
OffRL
620
1
0
01 Feb 2023
Reinforcement Learning with Large Action Spaces for Neural Machine
  Translation
Reinforcement Learning with Large Action Spaces for Neural Machine TranslationInternational Conference on Computational Linguistics (COLING), 2022
Asaf Yehudai
Leshem Choshen
Lior Fox
Omri Abend
313
7
0
06 Oct 2022
Improved Policy Optimization for Online Imitation Learning
Improved Policy Optimization for Online Imitation Learning
J. Lavington
Sharan Vaswani
Mark Schmidt
OffRL
325
7
0
29 Jul 2022
Interactive Imitation Learning in Robotics based on Simulations
Interactive Imitation Learning in Robotics based on Simulations
Xinyi Liu
300
2
0
26 Jul 2022
Continual Meta-Reinforcement Learning for UAV-Aided Vehicular Wireless
  Networks
Continual Meta-Reinforcement Learning for UAV-Aided Vehicular Wireless Networks
Riccardo Marini
Sangwoo Park
Osvaldo Simeone
C. Buratti
366
11
0
13 Jul 2022
Efficient Distributed Framework for Collaborative Multi-Agent
  Reinforcement Learning
Efficient Distributed Framework for Collaborative Multi-Agent Reinforcement Learning
Shuhan Qi
Shuhao Zhang
Xiaohan Hou
Jia-jia Zhang
Xinyu Wang
Jing Xiao
227
0
0
11 May 2022
Variance Reduction based Partial Trajectory Reuse to Accelerate Policy
  Gradient Optimization
Variance Reduction based Partial Trajectory Reuse to Accelerate Policy Gradient Optimization
Hua Zheng
Wei Xie
344
2
0
06 May 2022
TASAC: a twin-actor reinforcement learning framework with stochastic
  policy for batch process control
TASAC: a twin-actor reinforcement learning framework with stochastic policy for batch process control
Tanuja Joshi
H. Kodamana
Harikumar Kandath
N. Kaisare
OffRL
123
0
0
22 Apr 2022
Remember and Forget Experience Replay for Multi-Agent Reinforcement
  Learning
Remember and Forget Experience Replay for Multi-Agent Reinforcement Learning
Pascal Weber
Daniel Wälchli
Mustafa Zeqiri
Petros Koumoutsakos
CLLOffRL
254
9
0
24 Mar 2022
Residual Robot Learning for Object-Centric Probabilistic Movement
  Primitives
Residual Robot Learning for Object-Centric Probabilistic Movement Primitives
João Carvalho
Dorothea Koert
Marek Daniv
Jan Peters
258
12
0
08 Mar 2022
A Temporal-Difference Approach to Policy Gradient Estimation
A Temporal-Difference Approach to Policy Gradient EstimationInternational Conference on Machine Learning (ICML), 2022
Samuele Tosatto
Andrew Patterson
Martha White
A. R. Mahmood
OffRL
522
3
0
04 Feb 2022
Optimal Estimation of Off-Policy Policy Gradient via Double Fitted
  Iteration
Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration
Chengzhuo Ni
Ruiqi Zhang
Xiang Ji
Xuezhou Zhang
Mengdi Wang
OffRL
423
1
0
31 Jan 2022
GCS: Graph-based Coordination Strategy for Multi-Agent Reinforcement
  Learning
GCS: Graph-based Coordination Strategy for Multi-Agent Reinforcement LearningAdaptive Agents and Multi-Agent Systems (AAMAS), 2022
Jingqing Ruan
Yali Du
Xuantang Xiong
Dengpeng Xing
Xiyun Li
Linghui Meng
Haifeng Zhang
Jun Wang
Bo Xu
184
39
0
17 Jan 2022
An Analytical Update Rule for General Policy Optimization
An Analytical Update Rule for General Policy Optimization
Hepeng Li
Nicholas Clavette
Haibo He
280
5
0
03 Dec 2021
Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning
Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning
Nicolai Dorka
Tim Welschehold
Joschka Boedecker
Wolfram Burgard
OffRL
267
14
0
24 Nov 2021
Off-policy Reinforcement Learning with Optimistic Exploration and
  Distribution Correction
Off-policy Reinforcement Learning with Optimistic Exploration and Distribution Correction
A. Ahmad
Shuo Cheng
D. Saraswat
Aly El Gamal
Wenjie Wang
Gurmukh Johal
OffRLOnRL
216
1
0
22 Oct 2021
Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm
Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm
Raghuram Bharadwaj Diddigi
Prateek Jain
P. J
S. Bhatnagar
CMLOffRL
351
3
0
19 Oct 2021
Offline Reinforcement Learning with Soft Behavior Regularization
Offline Reinforcement Learning with Soft Behavior Regularization
Haoran Xu
Xianyuan Zhan
Jianxiong Li
Honglei Yin
OffRL
173
34
0
14 Oct 2021
Learning Natural Language Generation from Scratch
Learning Natural Language Generation from Scratch
Alice Martin Donati
Guillaume Quispe
Charles Ollion
Sylvain Le Corff
Florian Strub
Olivier Pietquin
LRM
189
4
0
20 Sep 2021
Deep Reinforcement Learning for Equal Risk Pricing and Hedging under
  Dynamic Expectile Risk Measures
Deep Reinforcement Learning for Equal Risk Pricing and Hedging under Dynamic Expectile Risk Measures
S. Marzban
Erick Delage
Jonathan Yu-Meng Li
146
4
0
09 Sep 2021
Implicitly Regularized RL with Implicit Q-Values
Implicitly Regularized RL with Implicit Q-Values
Nino Vieillard
Marcin Andrychowicz
Anton Raichuk
Olivier Pietquin
Matthieu Geist
OffRL
231
9
0
16 Aug 2021
Optimal Actor-Critic Policy with Optimized Training Datasets
Optimal Actor-Critic Policy with Optimized Training Datasets
C. Banerjee
Zhiyong Chen
N. Noman
M. Zamani
OffRL
308
10
0
16 Aug 2021
Off-Policy Reinforcement Learning with Delayed Rewards
Off-Policy Reinforcement Learning with Delayed RewardsInternational Conference on Machine Learning (ICML), 2021
Beining Han
Zhizhou Ren
Zuofan Wu
Yuanshuo Zhou
Jian-wei Peng
OffRL
205
46
0
22 Jun 2021
On the Convergence Rate of Off-Policy Policy Optimization Methods with
  Density-Ratio Correction
On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio CorrectionInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2021
Jiawei Huang
Nan Jiang
353
6
0
02 Jun 2021
Learning to Optimize Industry-Scale Dynamic Pickup and Delivery Problems
Learning to Optimize Industry-Scale Dynamic Pickup and Delivery ProblemsIEEE International Conference on Data Engineering (ICDE), 2021
Xijun Li
Weilin Luo
Mingxuan Yuan
Jun Wang
Jiawen Lu
Jie Wang
Jinhu Lu
Jia Zeng
214
51
0
27 May 2021
Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear
  Function Approximation
Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function ApproximationIEEE Control Systems Letters (L-CSS), 2021
Zaiwei Chen
S. Khodadadian
S. T. Maguluri
OffRL
297
33
0
26 May 2021
Unbiased Asymmetric Reinforcement Learning under Partial Observability
Unbiased Asymmetric Reinforcement Learning under Partial ObservabilityAdaptive Agents and Multi-Agent Systems (AAMAS), 2021
Andrea Baisero
Chris Amato
OffRL
255
27
0
25 May 2021
Towards a Sample Efficient Reinforcement Learning Pipeline for Vision
  Based Robotics
Towards a Sample Efficient Reinforcement Learning Pipeline for Vision Based Robotics
Maxence Mahe
Pierre Belamri
Jesús Bujalance Martín
206
0
0
20 May 2021
Behavior-Guided Actor-Critic: Improving Exploration via Learning Policy
  Behavior Representation for Deep Reinforcement Learning
Behavior-Guided Actor-Critic: Improving Exploration via Learning Policy Behavior Representation for Deep Reinforcement Learning
Ammar Fayad
M. Ibrahim
BDL
172
3
0
09 Apr 2021
NQMIX: Non-monotonic Value Function Factorization for Deep Multi-Agent
  Reinforcement Learning
NQMIX: Non-monotonic Value Function Factorization for Deep Multi-Agent Reinforcement Learning
Quanlin Chen
OffRL
285
0
0
05 Apr 2021
Joint Resource Management for MC-NOMA: A Deep Reinforcement Learning
  Approach
Joint Resource Management for MC-NOMA: A Deep Reinforcement Learning ApproachIEEE Transactions on Wireless Communications (IEEE TWC), 2021
Shaoyang Wang
Tiejun Lv
Wei Ni
N. Beaulieu
Y. Guo
117
55
0
29 Mar 2021
Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm
Finite-Sample Analysis of Off-Policy Natural Actor-Critic AlgorithmInternational Conference on Machine Learning (ICML), 2021
S. Khodadadian
Zaiwei Chen
S. T. Maguluri
CMLOffRL
361
33
0
18 Feb 2021
Smoothed functional-based gradient algorithms for off-policy
  reinforcement learning: A non-asymptotic viewpoint
Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint
Nithia Vijayan
A. PrashanthL.
OffRL
396
7
0
06 Jan 2021
Adaptable Automation with Modular Deep Reinforcement Learning and Policy
  Transfer
Adaptable Automation with Modular Deep Reinforcement Learning and Policy TransferEngineering applications of artificial intelligence (EAAI), 2020
Zohreh Raziei
Mohsen Moghaddam
192
29
0
27 Nov 2020
What About Inputing Policy in Value Function: Policy Representation and
  Policy-extended Value Function Approximator
What About Inputing Policy in Value Function: Policy Representation and Policy-extended Value Function Approximator
Hongyao Tang
Zhaopeng Meng
Jianye Hao
Chong Chen
D. Graves
...
Hangyu Mao
Wulong Liu
Yaodong Yang
Wenyuan Tao
Li Wang
OffRL
323
6
0
19 Oct 2020
Human-centric Dialog Training via Offline Reinforcement Learning
Human-centric Dialog Training via Offline Reinforcement Learning
Natasha Jaques
J. Shen
Asma Ghandeharioun
Craig Ferguson
Àgata Lapedriza
Noah J. Jones
S. Gu
Rosalind W. Picard
OffRL
268
115
0
12 Oct 2020
Neural Mask Generator: Learning to Generate Adaptive Word Maskings for
  Language Model Adaptation
Neural Mask Generator: Learning to Generate Adaptive Word Maskings for Language Model Adaptation
Minki Kang
Moonsu Han
Sung Ju Hwang
OOD
295
18
0
06 Oct 2020
Learning from eXtreme Bandit Feedback
Learning from eXtreme Bandit FeedbackAAAI Conference on Artificial Intelligence (AAAI), 2020
Romain Lopez
Inderjit S. Dhillon
Sai Li
OffRL
259
26
0
27 Sep 2020
Reinforcement Learning for Strategic Recommendations
Reinforcement Learning for Strategic Recommendations
Georgios Theocharous
Yash Chandak
Philip S. Thomas
F. D. Nijs
OffRL
250
13
0
15 Sep 2020
Variance-Reduced Off-Policy Memory-Efficient Policy Search
Variance-Reduced Off-Policy Memory-Efficient Policy Search
Daoming Lyu
Qi Qi
Mohammad Ghavamzadeh
Hengshuai Yao
Tianbao Yang
Bo Liu
OffRL
221
7
0
14 Sep 2020
Forward and inverse reinforcement learning sharing network weights and
  hyperparameters
Forward and inverse reinforcement learning sharing network weights and hyperparameters
E. Uchibe
Kenji Doya
193
22
0
17 Aug 2020
Off-Policy Multi-Agent Decomposed Policy Gradients
Off-Policy Multi-Agent Decomposed Policy GradientsInternational Conference on Learning Representations (ICLR), 2020
Yihan Wang
Beining Han
Tonghan Wang
Heng Dong
Chongjie Zhang
281
206
0
24 Jul 2020
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline
  and Online RL
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL
Seyed Kamyar Seyed Ghasemipour
Dale Schuurmans
S. Gu
OffRL
592
137
0
21 Jul 2020
Meta-Gradient Reinforcement Learning with an Objective Discovered Online
Meta-Gradient Reinforcement Learning with an Objective Discovered OnlineNeural Information Processing Systems (NeurIPS), 2020
Zhongwen Xu
H. V. Hasselt
Matteo Hessel
Junhyuk Oh
Satinder Singh
David Silver
353
89
0
16 Jul 2020
Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep
  Reinforcement Learning
Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep Reinforcement Learning
Sabrina Hoppe
Marc Toussaint
OffRL
216
7
0
15 Jul 2020
Deep reinforcement learning driven inspection and maintenance planning
  under incomplete information and constraints
Deep reinforcement learning driven inspection and maintenance planning under incomplete information and constraints
C. Andriotis
K. Papakonstantinou
211
120
0
02 Jul 2020
Adversarial Soft Advantage Fitting: Imitation Learning without Policy
  Optimization
Adversarial Soft Advantage Fitting: Imitation Learning without Policy OptimizationNeural Information Processing Systems (NeurIPS), 2020
Paul Barde
Julien Roy
Wonseok Jeon
Joelle Pineau
C. Pal
Derek Nowrouzezahrai
318
29
0
23 Jun 2020
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets
Ashvin Nair
Abhishek Gupta
Murtaza Dalal
Sergey Levine
OffRLOnRL
920
774
0
16 Jun 2020
123
Next
Page 1 of 3