ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.07073
  4. Cited By
Is the Policy Gradient a Gradient?

Is the Policy Gradient a Gradient?

17 June 2019
Chris Nota
Philip S. Thomas
ArXivPDFHTML

Papers citing "Is the Policy Gradient a Gradient?"

40 / 40 papers shown
Title
Analysis of On-policy Policy Gradient Methods under the Distribution Mismatch
Analysis of On-policy Policy Gradient Methods under the Distribution Mismatch
Weizhen Wang
Jianping He
Xiaoming Duan
46
0
0
28 Mar 2025
Optimizing Navigation And Chemical Application in Precision Agriculture With Deep Reinforcement Learning And Conditional Action Tree
Optimizing Navigation And Chemical Application in Precision Agriculture With Deep Reinforcement Learning And Conditional Action Tree
Mahsa Khosravi
Zhanhong Jiang
Joshua R. Waite
Sarah Jonesc
Hernan Torres
Arti Singh
Baskar Ganapathysubramanian
Asheesh Kumar Singh
Soumik Sarkar
51
0
0
23 Mar 2025
Streaming Deep Reinforcement Learning Finally Works
Streaming Deep Reinforcement Learning Finally Works
Mohamed Elsayed
Gautham Vasan
A. R. Mahmood
OffRL
60
4
0
18 Oct 2024
An Introduction to Centralized Training for Decentralized Execution in
  Cooperative Multi-Agent Reinforcement Learning
An Introduction to Centralized Training for Decentralized Execution in Cooperative Multi-Agent Reinforcement Learning
Christopher Amato
OffRL
41
9
0
04 Sep 2024
OMPO: A Unified Framework for RL under Policy and Dynamics Shifts
OMPO: A Unified Framework for RL under Policy and Dynamics Shifts
Yu-Juan Luo
Tianying Ji
Gang Hua
Jianwei Zhang
Huazhe Xu
Xianyuan Zhan
OffRL
71
3
0
29 May 2024
No Representation, No Trust: Connecting Representation, Collapse, and
  Trust Issues in PPO
No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO
Skander Moalla
Andrea Miele
Razvan Pascanu
Çağlar Gülçehre
41
4
0
01 May 2024
Feint in Multi-Player Games
Feint in Multi-Player Games
Junyu Liu
Wangkai Jin
Xiangjun Peng
OffRL
38
0
0
04 Mar 2024
Behavior Alignment via Reward Function Optimization
Behavior Alignment via Reward Function Optimization
Dhawal Gupta
Yash Chandak
Scott M. Jordan
Philip S. Thomas
Bruno Castro da Silva
36
10
0
29 Oct 2023
Would I have gotten that reward? Long-term credit assignment by
  counterfactual contribution analysis
Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis
Alexander Meulemans
Simon Schug
Seijin Kobayashi
Nathaniel D. Daw
Gregory Wayne
55
3
0
29 Jun 2023
Correcting discount-factor mismatch in on-policy policy gradient methods
Correcting discount-factor mismatch in on-policy policy gradient methods
Fengdi Che
Gautham Vasan
A. R. Mahmood
OffRL
33
9
0
23 Jun 2023
Accelerating Value Iteration with Anchoring
Accelerating Value Iteration with Anchoring
Jongmin Lee
Ernest K. Ryu
35
7
0
26 May 2023
A Coupled Flow Approach to Imitation Learning
A Coupled Flow Approach to Imitation Learning
G. Freund
Elad Sarafian
Sarit Kraus
OOD
38
12
0
29 Apr 2023
A Tale of Sampling and Estimation in Discounted Reinforcement Learning
A Tale of Sampling and Estimation in Discounted Reinforcement Learning
Alberto Maria Metelli
Mirco Mutti
Marcello Restelli
OffRL
57
2
0
11 Apr 2023
Revisiting Estimation Bias in Policy Gradients for Deep Reinforcement
  Learning
Revisiting Estimation Bias in Policy Gradients for Deep Reinforcement Learning
Haoxuan Pan
Deheng Ye
Xiaoming Duan
Qiang Fu
Wei Yang
Jianping He
Mingfei Sun
OffRL
25
2
0
20 Jan 2023
On the Convergence of Discounted Policy Gradient Methods
On the Convergence of Discounted Policy Gradient Methods
Chris Nota
26
0
0
28 Dec 2022
On Many-Actions Policy Gradient
On Many-Actions Policy Gradient
Michal Nauman
Marek Cygan
24
0
0
24 Oct 2022
Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning
  Environments
Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning Environments
Ryan Sullivan
J. K. Terry
Benjamin Black
John P. Dickerson
35
8
0
14 May 2022
Is Vanilla Policy Gradient Overlooked? Analyzing Deep Reinforcement
  Learning for Hanabi
Is Vanilla Policy Gradient Overlooked? Analyzing Deep Reinforcement Learning for Hanabi
Bram Grooten
Jelle Wemmenhove
Maurice Poot
J. Portegies
29
3
0
22 Mar 2022
Influencing Long-Term Behavior in Multiagent Reinforcement Learning
Influencing Long-Term Behavior in Multiagent Reinforcement Learning
Dong-Ki Kim
Matthew D Riemer
Miao Liu
Jakob N. Foerster
Michael Everett
Chuangchuang Sun
Gerald Tesauro
Jonathan P. How
63
0
0
07 Mar 2022
Distributional Reinforcement Learning for Scheduling of Chemical
  Production Processes
Distributional Reinforcement Learning for Scheduling of Chemical Production Processes
M. Mowbray
Dongda Zhang
Ehecatl Antonio del Rio Chanona
OffRL
30
6
0
01 Mar 2022
Beyond the Policy Gradient Theorem for Efficient Policy Updates in
  Actor-Critic Algorithms
Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms
Romain Laroche
Rémi Tachet des Combes
53
2
0
15 Feb 2022
A Temporal-Difference Approach to Policy Gradient Estimation
A Temporal-Difference Approach to Policy Gradient Estimation
Samuele Tosatto
Andrew Patterson
Martha White
A. R. Mahmood
OffRL
36
2
0
04 Feb 2022
3DPG: Distributed Deep Deterministic Policy Gradient Algorithms for
  Networked Multi-Agent Systems
3DPG: Distributed Deep Deterministic Policy Gradient Algorithms for Networked Multi-Agent Systems
Adrian Redder
Arunselvan Ramaswamy
Holger Karl
OffRL
26
2
0
03 Jan 2022
Continual Learning In Environments With Polynomial Mixing Times
Continual Learning In Environments With Polynomial Mixing Times
Matthew D Riemer
Sharath Chandra Raparthy
Ignacio Cases
G. Subbaraj
M. P. Touzel
Irina Rish
CLL
46
8
0
13 Dec 2021
Dr Jekyll and Mr Hyde: the Strange Case of Off-Policy Policy Updates
Dr Jekyll and Mr Hyde: the Strange Case of Off-Policy Policy Updates
Romain Laroche
Rémi Tachet des Combes
51
8
0
29 Sep 2021
Greedification Operators for Policy Optimization: Investigating Forward
  and Reverse KL Divergences
Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences
Alan Chan
Hugo Silva
Sungsu Lim
Tadashi Kozuno
A. R. Mahmood
Martha White
30
29
0
17 Jul 2021
Examining average and discounted reward optimality criteria in
  reinforcement learning
Examining average and discounted reward optimality criteria in reinforcement learning
Vektor Dewanto
M. Gallagher
OffRL
22
17
0
03 Jul 2021
Taylor Expansion of Discount Factors
Taylor Expansion of Discount Factors
Yunhao Tang
Mark Rowland
Rémi Munos
Michal Valko
OffRL
44
5
0
11 Jun 2021
Adversarial Intrinsic Motivation for Reinforcement Learning
Adversarial Intrinsic Motivation for Reinforcement Learning
Ishan Durugkar
Mauricio Tec
S. Niekum
Peter Stone
OOD
69
39
0
27 May 2021
Multi-Agent Reinforcement Learning with Temporal Logic Specifications
Multi-Agent Reinforcement Learning with Temporal Logic Specifications
Lewis Hammond
Alessandro Abate
Julian Gutierrez
Michael Wooldridge
AI4CE
55
32
0
01 Feb 2021
Batch Reinforcement Learning with a Nonparametric Off-Policy Policy
  Gradient
Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient
Samuele Tosatto
João Carvalho
Jan Peters
OffRL
24
7
0
27 Oct 2020
Logistic Q-Learning
Logistic Q-Learning
Joan Bas-Serrano
Sebastian Curi
Andreas Krause
Gergely Neu
36
40
0
21 Oct 2020
A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms
A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms
Shangtong Zhang
Romain Laroche
H. V. Seijen
Shimon Whiteson
Rémi Tachet des Combes
51
15
0
02 Oct 2020
State Action Separable Reinforcement Learning
State Action Separable Reinforcement Learning
Ziyao Zhang
Liang Ma
K. Leung
Konstantinos Poularakis
Mudhakar Srivatsa
36
2
0
05 Jun 2020
F2A2: Flexible Fully-decentralized Approximate Actor-critic for
  Cooperative Multi-agent Reinforcement Learning
F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning
Wenhao Li
Bo Jin
Xiangfeng Wang
Junchi Yan
H. Zha
30
21
0
17 Apr 2020
Universal Value Density Estimation for Imitation Learning and
  Goal-Conditioned Reinforcement Learning
Universal Value Density Estimation for Imitation Learning and Goal-Conditioned Reinforcement Learning
Yannick Schroecker
Charles Isbell
OffRL
36
13
0
15 Feb 2020
Modern Deep Reinforcement Learning Algorithms
Modern Deep Reinforcement Learning Algorithms
Sergey Ivanov
A. Dýakonov
OffRL
29
39
0
24 Jun 2019
Classical Policy Gradient: Preserving Bellman's Principle of Optimality
Classical Policy Gradient: Preserving Bellman's Principle of Optimality
Philip S. Thomas
Scott M. Jordan
Yash Chandak
Chris Nota
James E. Kostas
OffRL
11
0
0
06 Jun 2019
Smoothing Policies and Safe Policy Gradients
Smoothing Policies and Safe Policy Gradients
Matteo Papini
Matteo Pirotta
Marcello Restelli
37
30
0
08 May 2019
Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy
  Improvement
Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement
Samuel Neumann
Sungsu Lim
A. Joseph
Yangchen Pan
Adam White
Martha White
33
7
0
22 Oct 2018
1