On the Global Convergence Rates of Softmax Policy Gradient Methods

13 May 2020

Papers citing "On the Global Convergence Rates of Softmax Policy Gradient Methods"

50 / 185 papers shown

Title
Dealing with Sparse Rewards in Continuous Control Robotics via Heavy-Tailed Policies Souradip Chakraborty Amrit Singh Bedi Alec Koppel Pratap Tokekar Dinesh Manocha 10 8 0 12 Jun 2022
Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning Ruida Zhou Tao-Wen Liu D. Kalathil P. R. Kumar Chao Tian 21 13 0 10 Jun 2022
Policy Optimization for Markov Games: Unified Framework and Faster Convergence Runyu Zhang Qinghua Liu Haiquan Wang Caiming Xiong Na Li Yu Bai 13 26 0 06 Jun 2022
Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs Dongsheng Ding K. Zhang Jiali Duan Tamer Bacsar Mihailo R. Jovanović 13 19 0 06 Jun 2022
Algorithm for Constrained Markov Decision Process with Linear Convergence E. Gladin Maksim Lavrik-Karmazin K. Zainullina Varvara Rudenko Alexander V. Gasnikov Martin Takáč 20 6 0 03 Jun 2022
KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal Tadashi Kozuno Wenhao Yang Nino Vieillard Toshinori Kitamura Yunhao Tang ... Michal Valko Rémi Munos Olivier Pietquin M. Geist Csaba Szepesvári 97 10 0 27 May 2022
Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space Johannes Muller Guido Montúfar 11 2 0 27 May 2022
Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games Sihan Zeng Thinh T. Doan J. Romberg 50 22 0 27 May 2022
Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Function Saeed Masiha Saber Salehkaleybar Niao He Negar Kiyavash Patrick Thiran 79 18 0 25 May 2022
Policy Gradient Method For Robust Reinforcement Learning Yue Wang Shaofeng Zou 81 67 0 15 May 2022
Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization Shicong Cen Fan Chen Yuejie Chi 21 15 0 12 Apr 2022
Linear convergence of a policy gradient method for some finite horizon continuous time control problems C. Reisinger Wolfgang Stockinger Yufei Zhang 16 5 0 22 Mar 2022
Accelerating Primal-dual Methods for Regularized Markov Decision Processes Haoya Li Hsiang-Fu Yu Lexing Ying Inderjit Dhillon 26 4 0 21 Feb 2022
Finite-Time Analysis of Natural Actor-Critic for POMDPs Semih Cayci Niao He R. Srikant 12 1 0 20 Feb 2022
Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms Romain Laroche Rémi Tachet des Combes 25 2 0 15 Feb 2022
Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization Runlong Zhou Zelin He Yuandong Tian Yi Wu S. Du OffRL 18 3 0 11 Feb 2022
On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games Runyu Zhang Jincheng Mei Bo Dai Dale Schuurmans Na Li 26 20 0 02 Feb 2022
STOPS: Short-Term-based Volatility-controlled Policy Search and its Global Convergence Liang Xu Daoming Lyu Yangchen Pan Aiwen Jiang Bo Liu 26 0 0 24 Jan 2022
Occupancy Information Ratio: Infinite-Horizon, Information-Directed, Parameterized Policy Search Wesley A. Suttle Alec Koppel Ji Liu 12 0 0 21 Jan 2022
Convergence of Policy Gradient for Entropy Regularized MDPs with Neural Network Approximation in the Mean-Field Regime B. Kerimkulov J. Leahy David Siska Lukasz Szpruch 22 11 0 18 Jan 2022
An Alternate Policy Gradient Estimator for Softmax Policies Shivam Garg Samuele Tosatto Yangchen Pan Martha White A. R. Mahmood 6 6 0 22 Dec 2021
Recent Advances in Reinforcement Learning in Finance B. Hambly Renyuan Xu Huining Yang OffRL 27 165 0 08 Dec 2021
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch Shangtong Zhang Rémi Tachet des Combes Romain Laroche 17 10 0 04 Nov 2021
Policy Optimization for Constrained MDPs with Provable Fast Global Convergence Tao-Wen Liu Ruida Zhou D. Kalathil P. R. Kumar Chao Tian 6 19 0 31 Oct 2021
Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings Matthew Shunshi Zhang Murat A. Erdogdu Animesh Garg 14 5 0 30 Oct 2021
Understanding the Effect of Stochasticity in Policy Optimization Jincheng Mei Bo Dai Chenjun Xiao Csaba Szepesvári Dale Schuurmans 11 17 0 29 Oct 2021
Neural PPO-Clip Attains Global Optimality: A Hinge Loss Perspective Nai-Chieh Huang Ping-Chun Hsieh Kuo-Hao Ho Hsuan-Yu Yao Kai-Chun Hu Liang-Chun Ouyang I-Chen Wu 22 1 0 26 Oct 2021
Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision Processes Sihan Zeng Thinh T. Doan J. Romberg 92 17 0 21 Oct 2021
Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process Tianjiao Li Ziwei Guan Shaofeng Zou Tengyu Xu Yingbin Liang Guanghui Lan 16 26 0 20 Oct 2021
Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization Yuhao Ding Junzi Zhang Hyunin Lee Javad Lavaei 24 18 0 19 Oct 2021
On the Global Optimum Convergence of Momentum-based Policy Gradient Yuhao Ding Junzi Zhang Javad Lavaei 19 16 0 19 Oct 2021
Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs Han Zhong Zhuoran Yang Zhaoran Wang Csaba Szepesvári 17 21 0 18 Oct 2021
A Dual Approach to Constrained Markov Decision Processes with Entropy Regularization Donghao Ying Yuhao Ding Javad Lavaei 6 32 0 17 Oct 2021
The Geometry of Memoryless Stochastic Policy Optimization in Infinite-Horizon POMDPs Johannes Muller Guido Montúfar 11 8 0 14 Oct 2021
The Benefits of Being Categorical Distributional: Uncertainty-aware Regularized Exploration in Reinforcement Learning Ke Sun Yingnan Zhao Enze Shi Yafei Wang Xiaodong Yan Bei Jiang Linglong Kong OOD OffRL UQCV 13 2 0 07 Oct 2021
Approximate Newton policy gradient algorithms Haoya Li Samarth Gupta Hsiangfu Yu Lexing Ying Inderjit Dhillon 41 2 0 05 Oct 2021
A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning Sihan Zeng Thinh T. Doan J. Romberg 63 22 0 29 Sep 2021
Dr Jekyll and Mr Hyde: the Strange Case of Off-Policy Policy Updates Romain Laroche Rémi Tachet des Combes 23 8 0 29 Sep 2021
Online Robust Reinforcement Learning with Model Uncertainty Yue Wang Shaofeng Zou OOD OffRL 68 96 0 29 Sep 2021
Theoretical Guarantees of Fictitious Discount Algorithms for Episodic Reinforcement Learning and Global Convergence of Policy Gradient Methods Xin Guo Anran Hu Junzi Zhang OffRL 16 6 0 13 Sep 2021
Reinforcement Learning for Load-balanced Parallel Particle Tracing Jiayi Xu Hanqi Guo Han-Wei Shen Mukund Raj Skylar W. Wurster Tom Peterka 14 6 0 13 Sep 2021
Multi-agent Natural Actor-critic Reinforcement Learning Algorithms Prashant Trivedi N. Hemachandra 13 4 0 03 Sep 2021
Global Convergence of the ODE Limit for Online Actor-Critic Algorithms in Reinforcement Learning Ziheng Wang Justin A. Sirignano 24 2 0 19 Aug 2021
A general class of surrogate functions for stable and efficient reinforcement learning Sharan Vaswani Olivier Bachem Simone Totaro Robert Mueller Shivam Garg M. Geist Marlos C. Machado P. S. Castro Nicolas Le Roux OffRL 24 15 0 12 Aug 2021
Variational Actor-Critic Algorithms Yuhua Zhu Lexing Ying OffRL 15 0 0 03 Aug 2021
A general sample complexity analysis of vanilla policy gradient Rui Yuan Robert Mansel Gower A. Lazaric 69 62 0 23 Jul 2021
Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences Alan Chan Hugo Silva Sungsu Lim Tadashi Kozuno A. R. Mahmood Martha White 9 29 0 17 Jul 2021
Cautious Policy Programming: Exploiting KL Regularization in Monotonic Policy Improvement for Reinforcement Learning Lingwei Zhu Toshinori Kitamura Takamitsu Matsubara OffRL 18 1 0 13 Jul 2021
MADE: Exploration via Maximizing Deviation from Explored Regions Tianjun Zhang Paria Rashidinejad Jiantao Jiao Yuandong Tian Joseph E. Gonzalez Stuart J. Russell OffRL 27 42 0 18 Jun 2021
On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control Amrit Singh Bedi Anjaly Parayil Junyu Zhang Mengdi Wang Alec Koppel 25 15 0 15 Jun 2021