ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.02151
  4. Cited By
Variational Policy Gradient Method for Reinforcement Learning with
  General Utilities

Variational Policy Gradient Method for Reinforcement Learning with General Utilities

4 July 2020
Junyu Zhang
Alec Koppel
Amrit Singh Bedi
Csaba Szepesvári
Mengdi Wang
ArXiv (abs)PDFHTML

Papers citing "Variational Policy Gradient Method for Reinforcement Learning with General Utilities"

50 / 87 papers shown
Flow Density Control: Generative Optimization Beyond Entropy-Regularized Fine-Tuning
Flow Density Control: Generative Optimization Beyond Entropy-Regularized Fine-Tuning
Riccardo De Santi
Marin Vlastelica
Ya-Ping Hsieh
Zebang Shen
Niao He
Andreas Krause
AI4CE
145
5
0
27 Nov 2025
On the Convergence of Policy Mirror Descent with Temporal Difference Evaluation
On the Convergence of Policy Mirror Descent with Temporal Difference Evaluation
Jiacai Liu
Wenye Li
Ke Wei
217
1
0
23 Sep 2025
Policy Gradient with Self-Attention for Model-Free Distributed Nonlinear Multi-Agent Games
Policy Gradient with Self-Attention for Model-Free Distributed Nonlinear Multi-Agent Games
Eduardo Sebastián
Maitrayee Keskar
Eeman Iqbal
Eduardo Montijano
C. Sagüés
Nikolay Atanasov
184
0
0
22 Sep 2025
Bayesian Risk-Sensitive Policy Optimization For MDPs With General Loss Functions
Bayesian Risk-Sensitive Policy Optimization For MDPs With General Loss Functions
Xiaoshuang Wang
Yifan Lin
Enlu Zhou
216
0
0
19 Sep 2025
The Geometry of Nonlinear Reinforcement Learning
The Geometry of Nonlinear Reinforcement Learning
Nikola Milosevic
Nico Scherf
131
0
0
01 Sep 2025
Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning
Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning
Pedro P. Santos
Alberto Sardinha
Francisco S. Melo
115
0
0
21 May 2025
Online Episodic Convex Reinforcement Learning
Online Episodic Convex Reinforcement Learning
B. Moreno
Khaled Eldowa
Pierre Gaillard
Margaux Brégère
Nadia Oudjane
OffRL
353
0
0
12 May 2025
Is there Value in Reinforcement Learning?
Is there Value in Reinforcement Learning?
Lior Fox
Y. Loewenstein
OffRL
256
0
0
07 May 2025
Kernel-Based Function Approximation for Average Reward Reinforcement
  Learning: An Optimist No-Regret Algorithm
Kernel-Based Function Approximation for Average Reward Reinforcement Learning: An Optimist No-Regret AlgorithmNeural Information Processing Systems (NeurIPS), 2024
Sattar Vakili
Julia Olkhovskaya
345
3
0
30 Oct 2024
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward InferenceInternational Conference on Learning Representations (ICLR), 2024
Qining Zhang
Lei Ying
OffRL
562
10
0
25 Sep 2024
The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes
The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes
Pedro P. Santos
Alberto Sardinha
Francisco S. Melo
197
1
0
23 Sep 2024
Geometric Active Exploration in Markov Decision Processes: the Benefit
  of Abstraction
Geometric Active Exploration in Markov Decision Processes: the Benefit of Abstraction
Ric De Santi
Federico Arangath Joseph
Noah Liniger
Mirco Mutti
Andreas Krause
AI4CE
270
5
0
18 Jul 2024
Global Reinforcement Learning: Beyond Linear and Convex Rewards via
  Submodular Semi-gradient Methods
Global Reinforcement Learning: Beyond Linear and Convex Rewards via Submodular Semi-gradient Methods
Ric De Santi
Manish Prajapat
Andreas Krause
332
13
0
13 Jul 2024
MetaCURL: Non-stationary Concave Utility Reinforcement Learning
MetaCURL: Non-stationary Concave Utility Reinforcement Learning
B. Moreno
Margaux Brégère
Pierre Gaillard
Nadia Oudjane
OffRL
280
3
0
30 May 2024
Inverse Concave-Utility Reinforcement Learning is Inverse Game Theory
Inverse Concave-Utility Reinforcement Learning is Inverse Game Theory
M. Çelikok
F. Oliehoek
Jan-Willem van de Meent
342
2
0
29 May 2024
A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints
A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints
Bram De Cooman
Johan A. K. Suykens
341
1
0
25 Apr 2024
On the Global Convergence of Policy Gradient in Average Reward Markov
  Decision Processes
On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes
Navdeep Kumar
Yashaswini Murthy
Itai Shufaro
Kfir Y. Levy
R. Srikant
Shie Mannor
228
11
0
11 Mar 2024
Taming Nonconvex Stochastic Mirror Descent with General Bregman
  Divergence
Taming Nonconvex Stochastic Mirror Descent with General Bregman Divergence
Ilyas Fatkhullin
Niao He
379
16
0
27 Feb 2024
Double Duality: Variational Primal-Dual Policy Optimization for
  Constrained Reinforcement Learning
Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning
Zihao Li
Boyi Liu
Zhuoran Yang
Zhaoran Wang
Mengdi Wang
343
2
0
16 Feb 2024
MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with
  Diverse Human Preferences
MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences
Souradip Chakraborty
Jiahao Qiu
Hui Yuan
Alec Koppel
Furong Huang
Dinesh Manocha
Amrit Singh Bedi
Mengdi Wang
ALM
233
29
0
14 Feb 2024
On the Limitations of Markovian Rewards to Express Multi-Objective,
  Risk-Sensitive, and Modal Tasks
On the Limitations of Markovian Rewards to Express Multi-Objective, Risk-Sensitive, and Modal TasksConference on Uncertainty in Artificial Intelligence (UAI), 2024
Joar Skalse
Alessandro Abate
263
13
0
26 Jan 2024
On the Stochastic (Variance-Reduced) Proximal Gradient Method for
  Regularized Expected Reward Optimization
On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization
Ling Liang
Haizhao Yang
237
1
0
23 Jan 2024
Quantum Advantage Actor-Critic for Reinforcement Learning
Quantum Advantage Actor-Critic for Reinforcement LearningInternational Conference on Agents and Artificial Intelligence (ICAART), 2024
Michael Kolle
Mohamad Hgog
Fabian Ritz
Philipp Altmann
Maximilian Zorn
Jonas Stein
Claudia Linnhoff-Popien
299
17
0
13 Jan 2024
Global Convergence of Natural Policy Gradient with Hessian-aided
  Momentum Variance Reduction
Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance ReductionJournal of Scientific Computing (J. Sci. Comput.), 2024
Jie Feng
Ke Wei
Jinchi Chen
412
4
0
02 Jan 2024
Neural Network Approximation for Pessimistic Offline Reinforcement
  Learning
Neural Network Approximation for Pessimistic Offline Reinforcement Learning
Di Wu
Yuling Jiao
Li Shen
Haizhao Yang
Xiliang Lu
OffRL
307
2
0
19 Dec 2023
Efficient Model-Based Concave Utility Reinforcement Learning through
  Greedy Mirror Descent
Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror DescentInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023
B. Moreno
Margaux Brégère
Pierre Gaillard
Nadia Oudjane
300
5
0
30 Nov 2023
Stable In-hand Manipulation with Finger Specific Multi-agent Shadow
  Reward
Stable In-hand Manipulation with Finger Specific Multi-agent Shadow Reward
Lingfeng Tao
Jiucai Zhang
Xiaoli Zhang
250
0
0
13 Sep 2023
Diversifying AI: Towards Creative Chess with AlphaZero
Diversifying AI: Towards Creative Chess with AlphaZero
Tom Zahavy
Vivek Veeriah
Shaobo Hou
Kevin Waugh
Matthew Lai
Edouard Leurent
Nenad Tomašev
Lisa Schut
Demis Hassabis
Satinder Singh
322
23
0
17 Aug 2023
Invex Programs: First Order Algorithms and Their Convergence
Invex Programs: First Order Algorithms and Their Convergence
Adarsh Barik
S. Sra
Jean Honorio
263
5
0
10 Jul 2023
Active Coverage for PAC Reinforcement Learning
Active Coverage for PAC Reinforcement LearningAnnual Conference Computational Learning Theory (COLT), 2023
Aymen Al Marjani
Andrea Tirinzoni
E. Kaufmann
OffRL
262
7
0
23 Jun 2023
A Single-Loop Deep Actor-Critic Algorithm for Constrained Reinforcement
  Learning with Provable Convergence
A Single-Loop Deep Actor-Critic Algorithm for Constrained Reinforcement Learning with Provable Convergence
Kexuan Wang
An Liu
Baishuo Liu
202
1
0
10 Jun 2023
Reinforcement Learning with General Utilities: Simpler Variance
  Reduction and Large State-Action Space
Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action SpaceInternational Conference on Machine Learning (ICML), 2023
Anas Barakat
Ilyas Fatkhullin
Niao He
258
17
0
02 Jun 2023
On the Linear Convergence of Policy Gradient under Hadamard
  Parameterization
On the Linear Convergence of Policy Gradient under Hadamard ParameterizationInformation and Inference A Journal of the IMA (JIII), 2023
Jiacai Liu
Jinchi Chen
Ke Wei
284
4
0
31 May 2023
Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with
  General Utilities
Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with General UtilitiesNeural Information Processing Systems (NeurIPS), 2023
Donghao Ying
Yunkai Zhang
Yuhao Ding
Alec Koppel
Javad Lavaei
423
22
0
27 May 2023
Inverse Reinforcement Learning with the Average Reward Criterion
Inverse Reinforcement Learning with the Average Reward CriterionNeural Information Processing Systems (NeurIPS), 2023
Feiyang Wu
Jingyang Ke
Anqi Wu
421
14
0
24 May 2023
A Coupled Flow Approach to Imitation Learning
A Coupled Flow Approach to Imitation LearningInternational Conference on Machine Learning (ICML), 2023
G. Freund
Elad Sarafian
Sarit Kraus
OOD
233
16
0
29 Apr 2023
What can online reinforcement learning with function approximation
  benefit from general coverage conditions?
What can online reinforcement learning with function approximation benefit from general coverage conditions?International Conference on Machine Learning (ICML), 2023
Fanghui Liu
Luca Viano
Volkan Cevher
OffRL
341
6
0
25 Apr 2023
Policy Gradient Converges to the Globally Optimal Policy for Nearly Linear-Quadratic Regulators
Policy Gradient Converges to the Globally Optimal Policy for Nearly Linear-Quadratic RegulatorsSIAM Journal of Control and Optimization (SICON), 2023
Yin-Huan Han
Meisam Razaviyayn
Renyuan Xu
503
7
0
15 Mar 2023
n-Step Temporal Difference Learning with Optimal n
n-Step Temporal Difference Learning with Optimal n
Lakshmi Mandal
S. Bhatnagar
465
3
0
13 Mar 2023
Deep Reinforcement Learning for Cost-Effective Medical Diagnosis
Deep Reinforcement Learning for Cost-Effective Medical DiagnosisInternational Conference on Learning Representations (ICLR), 2023
Zheng Yu
Yikuan Li
Joseph C. Kim
Kai Huang
Yuan Luo
Mengdi Wang
OffRL
378
23
0
20 Feb 2023
Scalable Multi-Agent Reinforcement Learning with General Utilities
Scalable Multi-Agent Reinforcement Learning with General UtilitiesAmerican Control Conference (ACC), 2023
Donghao Ying
Yuhao Ding
Alec Koppel
Javad Lavaei
278
2
0
15 Feb 2023
Provably Efficient Offline Goal-Conditioned Reinforcement Learning with
  General Function Approximation and Single-Policy Concentrability
Provably Efficient Offline Goal-Conditioned Reinforcement Learning with General Function Approximation and Single-Policy ConcentrabilityNeural Information Processing Systems (NeurIPS), 2023
Hanlin Zhu
Amy Zhang
OffRL
368
5
0
07 Feb 2023
Stochastic Policy Gradient Methods: Improved Sample Complexity for
  Fisher-non-degenerate Policies
Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate PoliciesInternational Conference on Machine Learning (ICML), 2023
Ilyas Fatkhullin
Anas Barakat
Anastasia Kireeva
Niao He
461
60
0
03 Feb 2023
A Novel Framework for Policy Mirror Descent with General
  Parameterization and Linear Convergence
A Novel Framework for Policy Mirror Descent with General Parameterization and Linear ConvergenceNeural Information Processing Systems (NeurIPS), 2023
Carlo Alfano
Rui Yuan
Patrick Rebeschini
669
24
0
30 Jan 2023
Importance Weighted Actor-Critic for Optimal Conservative Offline
  Reinforcement Learning
Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023
Hanlin Zhu
Paria Rashidinejad
Jiantao Jiao
OffRL
565
20
0
30 Jan 2023
Optimal Conservative Offline RL with General Function Approximation via
  Augmented Lagrangian
Optimal Conservative Offline RL with General Function Approximation via Augmented LagrangianInternational Conference on Learning Representations (ICLR), 2022
Paria Rashidinejad
Hanlin Zhu
Kunhe Yang
Stuart J. Russell
Jiantao Jiao
OffRL
469
34
0
01 Nov 2022
Proximal Mean Field Learning in Shallow Neural Networks
Proximal Mean Field Learning in Shallow Neural Networks
Alexis M. H. Teter
Iman Nodozi
A. Halder
FedML
312
1
0
25 Oct 2022
Policy Gradient for Reinforcement Learning with General Utilities
Policy Gradient for Reinforcement Learning with General Utilities
Navdeep Kumar
Kaixin Wang
Kfir Y. Levy
Shie Mannor
119
6
0
03 Oct 2022
On the convex formulations of robust Markov decision processes
On the convex formulations of robust Markov decision processesMathematics of Operations Research (MOR), 2022
Julien Grand-Clément
Marek Petrik
313
13
0
21 Sep 2022
Cross apprenticeship learning framework: Properties and solution
  approaches
Cross apprenticeship learning framework: Properties and solution approaches
A. Aravind
Debasish Chatterjee
A. Cherukuri
217
0
0
06 Sep 2022
12
Next
Page 1 of 2