An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods

15 November 2022

Papers citing "An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods"

23 / 73 papers shown

Title
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch Shangtong Zhang Rémi Tachet des Combes Romain Laroche 17 10 0 04 Nov 2021
Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings Matthew Shunshi Zhang Murat A. Erdogdu Animesh Garg 6 5 0 30 Oct 2021
Neural PPO-Clip Attains Global Optimality: A Hinge Loss Perspective Nai-Chieh Huang Ping-Chun Hsieh Kuo-Hao Ho Hsuan-Yu Yao Kai-Chun Hu Liang-Chun Ouyang I-Chen Wu 20 1 0 26 Oct 2021
Actor-critic is implicitly biased towards high entropy optimal policies Yuzheng Hu Ziwei Ji Matus Telgarsky 52 11 0 21 Oct 2021
Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization Yuhao Ding Junzi Zhang Hyunin Lee Javad Lavaei 16 18 0 19 Oct 2021
On the Global Optimum Convergence of Momentum-based Policy Gradient Yuhao Ding Junzi Zhang Javad Lavaei 19 16 0 19 Oct 2021
Theoretical Guarantees of Fictitious Discount Algorithms for Episodic Reinforcement Learning and Global Convergence of Policy Gradient Methods Xin Guo Anran Hu Junzi Zhang OffRL 14 6 0 13 Sep 2021
On the Approximation of Cooperative Heterogeneous Multi-Agent Reinforcement Learning (MARL) using Mean Field Control (MFC) Washim Uddin Mondal Mridul Agarwal Vaneet Aggarwal S. Ukkusuri 33 43 0 09 Sep 2021
Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis Ziyi Chen Yi Zhou Rongrong Chen Shaofeng Zou 13 24 0 08 Sep 2021
A general sample complexity analysis of vanilla policy gradient Rui Yuan Robert Mansel Gower A. Lazaric 64 62 0 23 Jul 2021
Bregman Gradient Policy Optimization Feihu Huang Shangqian Gao Heng-Chiao Huang 9 16 0 23 Jun 2021
Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations Christoph Dann Yishay Mansour M. Mohri Ayush Sekhari Karthik Sridharan OffRL 11 11 0 22 Jun 2021
On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control Amrit Singh Bedi Anjaly Parayil Junyu Zhang Mengdi Wang Alec Koppel 14 15 0 15 Jun 2021
Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm Qinbo Bai Mridul Agarwal Vaneet Aggarwal 8 7 0 28 May 2021
A nearly Blackwell-optimal policy gradient method Vektor Dewanto M. Gallagher OffRL 8 0 0 28 May 2021
Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation Zaiwei Chen S. Khodadadian S. T. Maguluri OffRL 43 29 0 26 May 2021
Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence Wenhao Zhan Shicong Cen Baihe Huang Yuxin Chen Jason D. Lee Yuejie Chi 9 76 0 24 May 2021
On the Linear convergence of Natural Policy Gradient Algorithm S. Khodadadian P. Jhunjhunwala Sushil Mahavir Varma S. T. Maguluri 22 56 0 04 May 2021
Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality Tengyu Xu Zhuoran Yang Zhaoran Wang Yingbin Liang OffRL 24 24 0 23 Feb 2021
Softmax Policy Gradient Methods Can Take Exponential Time to Converge Gen Li Yuting Wei Yuejie Chi Yuxin Chen 13 50 0 22 Feb 2021
Finite Sample Analysis of Two-Time-Scale Natural Actor-Critic Algorithm S. Khodadadian Thinh T. Doan J. Romberg S. T. Maguluri 17 42 0 26 Jan 2021
Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint Nithia Vijayan A. PrashanthL. OffRL 11 6 0 06 Jan 2021
A Finite Time Analysis of Two Time-Scale Actor Critic Methods Yue Wu Weitong Zhang Pan Xu Quanquan Gu 88 145 0 04 May 2020