On the Global Convergence Rates of Softmax Policy Gradient Methods

13 May 2020

Papers citing "On the Global Convergence Rates of Softmax Policy Gradient Methods"

35 / 185 papers shown

Title
Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation Semih Cayci Niao He R. Srikant 21 35 0 08 Jun 2021
On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction Jiawei Huang Nan Jiang 11 5 0 02 Jun 2021
Gradient play in stochastic games: stationary points, convergence, and sample complexity Runyu Zhang Zhaolin Ren Na Li 18 43 0 01 Jun 2021
Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization Shicong Cen Yuting Wei Yuejie Chi 24 77 0 31 May 2021
Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm Qinbo Bai Mridul Agarwal Vaneet Aggarwal 8 7 0 28 May 2021
Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation Zaiwei Chen S. Khodadadian S. T. Maguluri OffRL 49 29 0 26 May 2021
Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence Wenhao Zhan Shicong Cen Baihe Huang Yuxin Chen Jason D. Lee Yuejie Chi 19 76 0 24 May 2021
Leveraging Non-uniformity in First-order Non-convex Optimization Jincheng Mei Yue Gao Bo Dai Csaba Szepesvári Dale Schuurmans 20 48 0 13 May 2021
On the Linear convergence of Natural Policy Gradient Algorithm S. Khodadadian P. Jhunjhunwala Sushil Mahavir Varma S. T. Maguluri 30 56 0 04 May 2021
Softmax Policy Gradient Methods Can Take Exponential Time to Converge Gen Li Yuting Wei Yuejie Chi Yuxin Chen 21 50 0 22 Feb 2021
Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm S. Khodadadian Zaiwei Chen S. T. Maguluri CML OffRL 69 26 0 18 Feb 2021
On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method Junyu Zhang Chengzhuo Ni Zheng Yu Csaba Szepesvári Mengdi Wang 44 67 0 17 Feb 2021
Improper Reinforcement Learning with Gradient-based Policy Optimization Mohammadi Zaki Avinash Mohan Aditya Gopalan Shie Mannor 6 0 0 16 Feb 2021
Optimization Issues in KL-Constrained Approximate Policy Iteration N. Lazić Botao Hao Yasin Abbasi-Yadkori Dale Schuurmans Csaba Szepesvári 11 10 0 11 Feb 2021
Policy Mirror Descent for Reinforcement Learning: Linear Convergence, New Sampling Complexity, and Generalized Problem Classes Guanghui Lan 87 136 0 30 Jan 2021
Finite Sample Analysis of Two-Time-Scale Natural Actor-Critic Algorithm S. Khodadadian Thinh T. Doan J. Romberg S. T. Maguluri 25 42 0 26 Jan 2021
Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint Nithia Vijayan A. PrashanthL. OffRL 19 6 0 06 Jan 2021
Towards Understanding Asynchronous Advantage Actor-critic: Convergence and Linear Speedup Han Shen K. Zhang Min-Fong Hong Tianyi Chen 19 28 0 31 Dec 2020
Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy Han Zhong Xun Deng Ethan X. Fang Zhuoran Yang Zhaoran Wang Runze Li 8 3 0 28 Dec 2020
A New Bandit Setting Balancing Information from State Evolution and Corrupted Context Alexander Galozy Sławomir Nowaczyk Mattias Ohlsson OffRL 20 2 0 16 Nov 2020
Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View Christos Thrampoulidis Samet Oymak Mahdi Soltanolkotabi 11 41 0 16 Nov 2020
Finding the Near Optimal Policy via Adaptive Reduced Regularization in MDPs Wenhao Yang Xiang Li Guangzeng Xie Zhihua Zhang 40 5 0 31 Oct 2020
Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime Andrea Agazzi Jianfeng Lu 11 15 0 22 Oct 2020
Sample Efficient Reinforcement Learning with REINFORCE Junzi Zhang Jongho Kim Brendan O'Donoghue Stephen P. Boyd 35 98 0 22 Oct 2020
On The Convergence of First Order Methods for Quasar-Convex Optimization Jikai Jin 14 9 0 10 Oct 2020
Beyond variance reduction: Understanding the true impact of baselines on policy optimization Wesley Chung Valentin Thomas Marlos C. Machado Nicolas Le Roux OffRL 6 22 0 31 Aug 2020
Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy Zuyue Fu Zhuoran Yang Zhaoran Wang 13 42 0 02 Aug 2020
Approximation Benefits of Policy Gradient Methods with Aggregated States Daniel Russo 38 7 0 22 Jul 2020
On Linear Convergence of Policy Gradient Methods for Finite MDPs Jalaj Bhandari Daniel Russo 53 59 0 21 Jul 2020
A Short Note on Soft-max and Policy Gradients in Bandits Problems N. Walton 6 1 0 20 Jul 2020
Regret Analysis of a Markov Policy Gradient Algorithm for Multi-arm Bandits D. Denisov N. Walton 13 8 0 20 Jul 2020
Variational Policy Gradient Method for Reinforcement Learning with General Utilities Junyu Zhang Alec Koppel Amrit Singh Bedi Csaba Szepesvári Mengdi Wang 6 137 0 04 Jul 2020
Meta-Learning Bandit Policies by Gradient Ascent B. Kveton Martin Mladenov Chih-Wei Hsu Manzil Zaheer Csaba Szepesvári Craig Boutilier 30 9 0 09 Jun 2020
A General Framework for Learning Mean-Field Games Xin Guo Anran Hu Renyuan Xu Junzi Zhang OffRL AI4CE 27 49 0 13 Mar 2020
Differentiable Bandit Exploration Craig Boutilier Chih-Wei Hsu B. Kveton Martin Mladenov Csaba Szepesvári Manzil Zaheer BDL OffRL 14 7 0 17 Feb 2020