Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.01891
Cited By
Trust-PCL: An Off-Policy Trust Region Method for Continuous Control
6 July 2017
Ofir Nachum
Mohammad Norouzi
Kelvin Xu
Dale Schuurmans
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Trust-PCL: An Off-Policy Trust Region Method for Continuous Control"
26 / 26 papers shown
Title
Bayesian regularization of empirical MDPs
Samarth Gupta
Daniel N. Hill
Lexing Ying
Inderjit Dhillon
OffRL
32
0
0
03 Aug 2022
Accelerating Primal-dual Methods for Regularized Markov Decision Processes
Haoya Li
Hsiang-Fu Yu
Lexing Ying
Inderjit Dhillon
39
4
0
21 Feb 2022
From Psychological Curiosity to Artificial Curiosity: Curiosity-Driven Learning in Artificial Intelligence Tasks
Chenyu Sun
Hangwei Qian
Chunyan Miao
20
10
0
20 Jan 2022
Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning
Wenzhuo Zhou
Ruoqing Zhu
Annie Qu
40
22
0
20 Oct 2021
Divergence-Regularized Multi-Agent Actor-Critic
Kefan Su
Zongqing Lu
46
25
0
01 Oct 2021
Implicitly Regularized RL with Implicit Q-Values
Nino Vieillard
Marcin Andrychowicz
Anton Raichuk
Olivier Pietquin
M. Geist
OffRL
24
9
0
16 Aug 2021
Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality
Tengyu Xu
Zhuoran Yang
Zhaoran Wang
Yingbin Liang
OffRL
49
24
0
23 Feb 2021
Dealing with Non-Stationarity in MARL via Trust-Region Decomposition
Wenhao Li
Xiangfeng Wang
Bo Jin
Junjie Sheng
H. Zha
36
7
0
21 Feb 2021
Mirror Descent Policy Optimization
Manan Tomar
Lior Shani
Yonathan Efroni
Mohammad Ghavamzadeh
30
83
0
20 May 2020
Off-Policy Deep Reinforcement Learning with Analogous Disentangled Exploration
Hoang Trung-Dung
Yitao Liang
Guy Van den Broeck
OffRL
22
3
0
25 Feb 2020
Direct and indirect reinforcement learning
Yang Guan
Shengbo Eben Li
Jingliang Duan
Jie Li
Yangang Ren
Qi Sun
B. Cheng
OffRL
38
34
0
23 Dec 2019
Multi-Path Policy Optimization
L. Pan
Qingpeng Cai
Longbo Huang
18
2
0
11 Nov 2019
Hindsight Trust Region Policy Optimization
Hanbo Zhang
Site Bai
Xuguang Lan
David Hsu
Nanning Zheng
38
8
0
29 Jul 2019
A Kernel Loss for Solving the Bellman Equation
Yihao Feng
Lihong Li
Qiang Liu
30
70
0
25 May 2019
On-Policy Trust Region Policy Optimisation with Replay Buffers
D. Kangin
N. Pugeault
OffRL
19
3
0
18 Jan 2019
Learning to Walk via Deep Reinforcement Learning
Tuomas Haarnoja
Sehoon Ha
Aurick Zhou
Jie Tan
George Tucker
Sergey Levine
54
433
0
26 Dec 2018
TD-Regularized Actor-Critic Methods
Simone Parisi
Voot Tangkaratt
Jan Peters
Mohammad Emtiyaz Khan
OffRL
30
32
0
19 Dec 2018
Policy Optimization as Wasserstein Gradient Flows
Ruiyi Zhang
Changyou Chen
Chunyuan Li
Lawrence Carin
21
66
0
09 Aug 2018
Data-Efficient Hierarchical Reinforcement Learning
Ofir Nachum
S. Gu
Honglak Lee
Sergey Levine
OffRL
68
797
0
21 May 2018
Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review
Sergey Levine
AI4CE
BDL
33
662
0
02 May 2018
Composable Deep Reinforcement Learning for Robotic Manipulation
Tuomas Haarnoja
Vitchyr H. Pong
Aurick Zhou
Murtaza Dalal
Pieter Abbeel
Sergey Levine
30
230
0
19 Mar 2018
Path Consistency Learning in Tsallis Entropy Regularized MDPs
Ofir Nachum
Yinlam Chow
Mohammad Ghavamzadeh
29
45
0
10 Feb 2018
Expected Policy Gradients for Reinforcement Learning
K. Ciosek
Shimon Whiteson
50
51
0
10 Jan 2018
SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation
Bo Dai
Albert Eaton Shaw
Lihong Li
Lin Xiao
Niao He
Zhen Liu
Jianshu Chen
Le Song
34
25
0
29 Dec 2017
A short variational proof of equivalence between policy gradients and soft Q learning
Pierre Harvey Richemond
B. Maginnis
16
5
0
22 Dec 2017
Emergence of Locomotion Behaviours in Rich Environments
N. Heess
TB Dhruva
S. Sriram
Jay Lemmon
J. Merel
...
Tom Erez
Ziyun Wang
S. M. Ali Eslami
Martin Riedmiller
David Silver
143
928
0
07 Jul 2017
1