ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.06392
  4. Cited By
On the Global Convergence Rates of Softmax Policy Gradient Methods

On the Global Convergence Rates of Softmax Policy Gradient Methods

13 May 2020
Jincheng Mei
Chenjun Xiao
Csaba Szepesvári
Dale Schuurmans
ArXivPDFHTML

Papers citing "On the Global Convergence Rates of Softmax Policy Gradient Methods"

50 / 185 papers shown
Title
Dealing with Sparse Rewards in Continuous Control Robotics via
  Heavy-Tailed Policies
Dealing with Sparse Rewards in Continuous Control Robotics via Heavy-Tailed Policies
Souradip Chakraborty
Amrit Singh Bedi
Alec Koppel
Pratap Tokekar
Dinesh Manocha
10
8
0
12 Jun 2022
Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective
  Reinforcement Learning
Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning
Ruida Zhou
Tao-Wen Liu
D. Kalathil
P. R. Kumar
Chao Tian
21
13
0
10 Jun 2022
Policy Optimization for Markov Games: Unified Framework and Faster
  Convergence
Policy Optimization for Markov Games: Unified Framework and Faster Convergence
Runyu Zhang
Qinghua Liu
Haiquan Wang
Caiming Xiong
Na Li
Yu Bai
13
26
0
06 Jun 2022
Convergence and sample complexity of natural policy gradient primal-dual
  methods for constrained MDPs
Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs
Dongsheng Ding
K. Zhang
Jiali Duan
Tamer Bacsar
Mihailo R. Jovanović
13
19
0
06 Jun 2022
Algorithm for Constrained Markov Decision Process with Linear
  Convergence
Algorithm for Constrained Markov Decision Process with Linear Convergence
E. Gladin
Maksim Lavrik-Karmazin
K. Zainullina
Varvara Rudenko
Alexander V. Gasnikov
Martin Takáč
20
6
0
03 Jun 2022
KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal
KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal
Tadashi Kozuno
Wenhao Yang
Nino Vieillard
Toshinori Kitamura
Yunhao Tang
...
Michal Valko
Rémi Munos
Olivier Pietquin
M. Geist
Csaba Szepesvári
97
10
0
27 May 2022
Solving infinite-horizon POMDPs with memoryless stochastic policies in
  state-action space
Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space
Johannes Muller
Guido Montúfar
11
2
0
27 May 2022
Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games
Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games
Sihan Zeng
Thinh T. Doan
J. Romberg
50
22
0
27 May 2022
Stochastic Second-Order Methods Improve Best-Known Sample Complexity of
  SGD for Gradient-Dominated Function
Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Function
Saeed Masiha
Saber Salehkaleybar
Niao He
Negar Kiyavash
Patrick Thiran
79
18
0
25 May 2022
Policy Gradient Method For Robust Reinforcement Learning
Policy Gradient Method For Robust Reinforcement Learning
Yue Wang
Shaofeng Zou
81
67
0
15 May 2022
Independent Natural Policy Gradient Methods for Potential Games:
  Finite-time Global Convergence with Entropy Regularization
Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization
Shicong Cen
Fan Chen
Yuejie Chi
21
15
0
12 Apr 2022
Linear convergence of a policy gradient method for some finite horizon
  continuous time control problems
Linear convergence of a policy gradient method for some finite horizon continuous time control problems
C. Reisinger
Wolfgang Stockinger
Yufei Zhang
16
5
0
22 Mar 2022
Accelerating Primal-dual Methods for Regularized Markov Decision
  Processes
Accelerating Primal-dual Methods for Regularized Markov Decision Processes
Haoya Li
Hsiang-Fu Yu
Lexing Ying
Inderjit Dhillon
26
4
0
21 Feb 2022
Finite-Time Analysis of Natural Actor-Critic for POMDPs
Finite-Time Analysis of Natural Actor-Critic for POMDPs
Semih Cayci
Niao He
R. Srikant
12
1
0
20 Feb 2022
Beyond the Policy Gradient Theorem for Efficient Policy Updates in
  Actor-Critic Algorithms
Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms
Romain Laroche
Rémi Tachet des Combes
25
2
0
15 Feb 2022
Understanding Curriculum Learning in Policy Optimization for Online
  Combinatorial Optimization
Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization
Runlong Zhou
Zelin He
Yuandong Tian
Yi Wu
S. Du
OffRL
18
3
0
11 Feb 2022
On the Global Convergence Rates of Decentralized Softmax Gradient Play
  in Markov Potential Games
On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games
Runyu Zhang
Jincheng Mei
Bo Dai
Dale Schuurmans
Na Li
26
20
0
02 Feb 2022
STOPS: Short-Term-based Volatility-controlled Policy Search and its
  Global Convergence
STOPS: Short-Term-based Volatility-controlled Policy Search and its Global Convergence
Liang Xu
Daoming Lyu
Yangchen Pan
Aiwen Jiang
Bo Liu
26
0
0
24 Jan 2022
Occupancy Information Ratio: Infinite-Horizon, Information-Directed,
  Parameterized Policy Search
Occupancy Information Ratio: Infinite-Horizon, Information-Directed, Parameterized Policy Search
Wesley A. Suttle
Alec Koppel
Ji Liu
12
0
0
21 Jan 2022
Convergence of Policy Gradient for Entropy Regularized MDPs with Neural
  Network Approximation in the Mean-Field Regime
Convergence of Policy Gradient for Entropy Regularized MDPs with Neural Network Approximation in the Mean-Field Regime
B. Kerimkulov
J. Leahy
David Siska
Lukasz Szpruch
22
11
0
18 Jan 2022
An Alternate Policy Gradient Estimator for Softmax Policies
An Alternate Policy Gradient Estimator for Softmax Policies
Shivam Garg
Samuele Tosatto
Yangchen Pan
Martha White
A. R. Mahmood
6
6
0
22 Dec 2021
Recent Advances in Reinforcement Learning in Finance
Recent Advances in Reinforcement Learning in Finance
B. Hambly
Renyuan Xu
Huining Yang
OffRL
27
165
0
08 Dec 2021
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch
Shangtong Zhang
Rémi Tachet des Combes
Romain Laroche
17
10
0
04 Nov 2021
Policy Optimization for Constrained MDPs with Provable Fast Global
  Convergence
Policy Optimization for Constrained MDPs with Provable Fast Global Convergence
Tao-Wen Liu
Ruida Zhou
D. Kalathil
P. R. Kumar
Chao Tian
6
19
0
31 Oct 2021
Convergence and Optimality of Policy Gradient Methods in Weakly Smooth
  Settings
Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings
Matthew Shunshi Zhang
Murat A. Erdogdu
Animesh Garg
14
5
0
30 Oct 2021
Understanding the Effect of Stochasticity in Policy Optimization
Understanding the Effect of Stochasticity in Policy Optimization
Jincheng Mei
Bo Dai
Chenjun Xiao
Csaba Szepesvári
Dale Schuurmans
11
17
0
29 Oct 2021
Neural PPO-Clip Attains Global Optimality: A Hinge Loss Perspective
Neural PPO-Clip Attains Global Optimality: A Hinge Loss Perspective
Nai-Chieh Huang
Ping-Chun Hsieh
Kuo-Hao Ho
Hsuan-Yu Yao
Kai-Chun Hu
Liang-Chun Ouyang
I-Chen Wu
22
1
0
26 Oct 2021
Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic
  Algorithm for Constrained Markov Decision Processes
Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision Processes
Sihan Zeng
Thinh T. Doan
J. Romberg
92
17
0
21 Oct 2021
Faster Algorithm and Sharper Analysis for Constrained Markov Decision
  Process
Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process
Tianjiao Li
Ziwei Guan
Shaofeng Zou
Tengyu Xu
Yingbin Liang
Guanghui Lan
16
26
0
20 Oct 2021
Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy
  Gradient Methods with Entropy Regularization
Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization
Yuhao Ding
Junzi Zhang
Hyunin Lee
Javad Lavaei
24
18
0
19 Oct 2021
On the Global Optimum Convergence of Momentum-based Policy Gradient
On the Global Optimum Convergence of Momentum-based Policy Gradient
Yuhao Ding
Junzi Zhang
Javad Lavaei
19
16
0
19 Oct 2021
Optimistic Policy Optimization is Provably Efficient in Non-stationary
  MDPs
Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs
Han Zhong
Zhuoran Yang
Zhaoran Wang
Csaba Szepesvári
17
21
0
18 Oct 2021
A Dual Approach to Constrained Markov Decision Processes with Entropy
  Regularization
A Dual Approach to Constrained Markov Decision Processes with Entropy Regularization
Donghao Ying
Yuhao Ding
Javad Lavaei
6
32
0
17 Oct 2021
The Geometry of Memoryless Stochastic Policy Optimization in
  Infinite-Horizon POMDPs
The Geometry of Memoryless Stochastic Policy Optimization in Infinite-Horizon POMDPs
Johannes Muller
Guido Montúfar
11
8
0
14 Oct 2021
The Benefits of Being Categorical Distributional: Uncertainty-aware
  Regularized Exploration in Reinforcement Learning
The Benefits of Being Categorical Distributional: Uncertainty-aware Regularized Exploration in Reinforcement Learning
Ke Sun
Yingnan Zhao
Enze Shi
Yafei Wang
Xiaodong Yan
Bei Jiang
Linglong Kong
OOD
OffRL
UQCV
13
2
0
07 Oct 2021
Approximate Newton policy gradient algorithms
Approximate Newton policy gradient algorithms
Haoya Li
Samarth Gupta
Hsiangfu Yu
Lexing Ying
Inderjit Dhillon
41
2
0
05 Oct 2021
A Two-Time-Scale Stochastic Optimization Framework with Applications in
  Control and Reinforcement Learning
A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning
Sihan Zeng
Thinh T. Doan
J. Romberg
63
22
0
29 Sep 2021
Dr Jekyll and Mr Hyde: the Strange Case of Off-Policy Policy Updates
Dr Jekyll and Mr Hyde: the Strange Case of Off-Policy Policy Updates
Romain Laroche
Rémi Tachet des Combes
23
8
0
29 Sep 2021
Online Robust Reinforcement Learning with Model Uncertainty
Online Robust Reinforcement Learning with Model Uncertainty
Yue Wang
Shaofeng Zou
OOD
OffRL
68
96
0
29 Sep 2021
Theoretical Guarantees of Fictitious Discount Algorithms for Episodic
  Reinforcement Learning and Global Convergence of Policy Gradient Methods
Theoretical Guarantees of Fictitious Discount Algorithms for Episodic Reinforcement Learning and Global Convergence of Policy Gradient Methods
Xin Guo
Anran Hu
Junzi Zhang
OffRL
16
6
0
13 Sep 2021
Reinforcement Learning for Load-balanced Parallel Particle Tracing
Reinforcement Learning for Load-balanced Parallel Particle Tracing
Jiayi Xu
Hanqi Guo
Han-Wei Shen
Mukund Raj
Skylar W. Wurster
Tom Peterka
14
6
0
13 Sep 2021
Multi-agent Natural Actor-critic Reinforcement Learning Algorithms
Multi-agent Natural Actor-critic Reinforcement Learning Algorithms
Prashant Trivedi
N. Hemachandra
13
4
0
03 Sep 2021
Global Convergence of the ODE Limit for Online Actor-Critic Algorithms
  in Reinforcement Learning
Global Convergence of the ODE Limit for Online Actor-Critic Algorithms in Reinforcement Learning
Ziheng Wang
Justin A. Sirignano
24
2
0
19 Aug 2021
A general class of surrogate functions for stable and efficient
  reinforcement learning
A general class of surrogate functions for stable and efficient reinforcement learning
Sharan Vaswani
Olivier Bachem
Simone Totaro
Robert Mueller
Shivam Garg
M. Geist
Marlos C. Machado
P. S. Castro
Nicolas Le Roux
OffRL
24
15
0
12 Aug 2021
Variational Actor-Critic Algorithms
Variational Actor-Critic Algorithms
Yuhua Zhu
Lexing Ying
OffRL
15
0
0
03 Aug 2021
A general sample complexity analysis of vanilla policy gradient
A general sample complexity analysis of vanilla policy gradient
Rui Yuan
Robert Mansel Gower
A. Lazaric
69
62
0
23 Jul 2021
Greedification Operators for Policy Optimization: Investigating Forward
  and Reverse KL Divergences
Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences
Alan Chan
Hugo Silva
Sungsu Lim
Tadashi Kozuno
A. R. Mahmood
Martha White
9
29
0
17 Jul 2021
Cautious Policy Programming: Exploiting KL Regularization in Monotonic
  Policy Improvement for Reinforcement Learning
Cautious Policy Programming: Exploiting KL Regularization in Monotonic Policy Improvement for Reinforcement Learning
Lingwei Zhu
Toshinori Kitamura
Takamitsu Matsubara
OffRL
18
1
0
13 Jul 2021
MADE: Exploration via Maximizing Deviation from Explored Regions
MADE: Exploration via Maximizing Deviation from Explored Regions
Tianjun Zhang
Paria Rashidinejad
Jiantao Jiao
Yuandong Tian
Joseph E. Gonzalez
Stuart J. Russell
OffRL
27
42
0
18 Jun 2021
On the Sample Complexity and Metastability of Heavy-tailed Policy Search
  in Continuous Control
On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control
Amrit Singh Bedi
Anjaly Parayil
Junyu Zhang
Mengdi Wang
Alec Koppel
25
15
0
15 Jun 2021
Previous
1234
Next