Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2007.02151
Cited By
Variational Policy Gradient Method for Reinforcement Learning with General Utilities
4 July 2020
Junyu Zhang
Alec Koppel
Amrit Singh Bedi
Csaba Szepesvári
Mengdi Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Variational Policy Gradient Method for Reinforcement Learning with General Utilities"
26 / 26 papers shown
Title
Online Episodic Convex Reinforcement Learning
B. Moreno
Khaled Eldowa
Pierre Gaillard
Margaux Brégère
Nadia Oudjane
OffRL
27
0
0
12 May 2025
Is there Value in Reinforcement Learning?
Lior Fox
Y. Loewenstein
OffRL
59
0
0
07 May 2025
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Qining Zhang
Lei Ying
OffRL
37
2
0
25 Sep 2024
Global Reinforcement Learning: Beyond Linear and Convex Rewards via Submodular Semi-gradient Methods
Ric De Santi
Manish Prajapat
Andreas Krause
36
3
0
13 Jul 2024
MetaCURL: Non-stationary Concave Utility Reinforcement Learning
B. Moreno
Margaux Brégère
Pierre Gaillard
Nadia Oudjane
OffRL
35
0
0
30 May 2024
A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints
Bram De Cooman
Johan A. K. Suykens
23
0
0
25 Apr 2024
Neural Network Approximation for Pessimistic Offline Reinforcement Learning
Di Wu
Yuling Jiao
Li Shen
Haizhao Yang
Xiliang Lu
OffRL
27
1
0
19 Dec 2023
Policy Gradient Converges to the Globally Optimal Policy for Nearly Linear-Quadratic Regulators
Yin-Huan Han
Meisam Razaviyayn
Renyuan Xu
22
5
0
15 Mar 2023
n-Step Temporal Difference Learning with Optimal n
Lakshmi Mandal
S. Bhatnagar
16
2
0
13 Mar 2023
Scalable Multi-Agent Reinforcement Learning with General Utilities
Donghao Ying
Yuhao Ding
Alec Koppel
Javad Lavaei
34
1
0
15 Feb 2023
Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning
Hanlin Zhu
Paria Rashidinejad
Jiantao Jiao
OffRL
30
15
0
30 Jan 2023
Proximal Mean Field Learning in Shallow Neural Networks
Alexis M. H. Teter
Iman Nodozi
A. Halder
FedML
35
1
0
25 Oct 2022
Cross apprenticeship learning framework: Properties and solution approaches
A. Aravind
Debasish Chatterjee
A. Cherukuri
26
0
0
06 Sep 2022
Improved Policy Optimization for Online Imitation Learning
J. Lavington
Sharan Vaswani
Mark W. Schmidt
OffRL
13
6
0
29 Jul 2022
Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning
Ruida Zhou
Tao-Wen Liu
D. Kalathil
P. R. Kumar
Chao Tian
21
13
0
10 Jun 2022
Jump-Start Reinforcement Learning
Ikechukwu Uchendu
Ted Xiao
Yao Lu
Banghua Zhu
Mengyuan Yan
...
Chuyuan Fu
Cong Ma
Jiantao Jiao
Sergey Levine
Karol Hausman
OffRL
OnRL
33
107
0
05 Apr 2022
Challenging Common Assumptions in Convex Reinforcement Learning
Mirco Mutti
Ric De Santi
Piersilvio De Bartolomeis
Marcello Restelli
OffRL
24
21
0
03 Feb 2022
Theoretical Guarantees of Fictitious Discount Algorithms for Episodic Reinforcement Learning and Global Convergence of Policy Gradient Methods
Xin Guo
Anran Hu
Junzi Zhang
OffRL
16
6
0
13 Sep 2021
Concave Utility Reinforcement Learning with Zero-Constraint Violations
Mridul Agarwal
Qinbo Bai
Vaneet Aggarwal
28
12
0
12 Sep 2021
Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
Andrea Zanette
Martin J. Wainwright
Emma Brunskill
OffRL
29
111
0
19 Aug 2021
A general sample complexity analysis of vanilla policy gradient
Rui Yuan
Robert Mansel Gower
A. Lazaric
69
62
0
23 Jul 2021
Concave Utility Reinforcement Learning: the Mean-Field Game Viewpoint
M. Geist
Julien Pérolat
Mathieu Laurière
Romuald Elie
Sarah Perrin
Olivier Bachem
Rémi Munos
Olivier Pietquin
19
62
0
07 Jun 2021
Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation
Zaiwei Chen
S. Khodadadian
S. T. Maguluri
OffRL
52
29
0
26 May 2021
On the Linear convergence of Natural Policy Gradient Algorithm
S. Khodadadian
P. Jhunjhunwala
Sushil Mahavir Varma
S. T. Maguluri
30
56
0
04 May 2021
Is Pessimism Provably Efficient for Offline RL?
Ying Jin
Zhuoran Yang
Zhaoran Wang
OffRL
27
345
0
30 Dec 2020
Sample Efficient Reinforcement Learning with REINFORCE
Junzi Zhang
Jongho Kim
Brendan O'Donoghue
Stephen P. Boyd
35
99
0
22 Oct 2020
1