Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.06392
Cited By
On the Global Convergence Rates of Softmax Policy Gradient Methods
13 May 2020
Jincheng Mei
Chenjun Xiao
Csaba Szepesvári
Dale Schuurmans
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Global Convergence Rates of Softmax Policy Gradient Methods"
35 / 185 papers shown
Title
Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation
Semih Cayci
Niao He
R. Srikant
21
35
0
08 Jun 2021
On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction
Jiawei Huang
Nan Jiang
11
5
0
02 Jun 2021
Gradient play in stochastic games: stationary points, convergence, and sample complexity
Runyu Zhang
Zhaolin Ren
Na Li
18
43
0
01 Jun 2021
Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization
Shicong Cen
Yuting Wei
Yuejie Chi
24
77
0
31 May 2021
Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm
Qinbo Bai
Mridul Agarwal
Vaneet Aggarwal
8
7
0
28 May 2021
Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation
Zaiwei Chen
S. Khodadadian
S. T. Maguluri
OffRL
49
29
0
26 May 2021
Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence
Wenhao Zhan
Shicong Cen
Baihe Huang
Yuxin Chen
Jason D. Lee
Yuejie Chi
19
76
0
24 May 2021
Leveraging Non-uniformity in First-order Non-convex Optimization
Jincheng Mei
Yue Gao
Bo Dai
Csaba Szepesvári
Dale Schuurmans
20
48
0
13 May 2021
On the Linear convergence of Natural Policy Gradient Algorithm
S. Khodadadian
P. Jhunjhunwala
Sushil Mahavir Varma
S. T. Maguluri
30
56
0
04 May 2021
Softmax Policy Gradient Methods Can Take Exponential Time to Converge
Gen Li
Yuting Wei
Yuejie Chi
Yuxin Chen
21
50
0
22 Feb 2021
Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm
S. Khodadadian
Zaiwei Chen
S. T. Maguluri
CML
OffRL
69
26
0
18 Feb 2021
On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method
Junyu Zhang
Chengzhuo Ni
Zheng Yu
Csaba Szepesvári
Mengdi Wang
44
67
0
17 Feb 2021
Improper Reinforcement Learning with Gradient-based Policy Optimization
Mohammadi Zaki
Avinash Mohan
Aditya Gopalan
Shie Mannor
6
0
0
16 Feb 2021
Optimization Issues in KL-Constrained Approximate Policy Iteration
N. Lazić
Botao Hao
Yasin Abbasi-Yadkori
Dale Schuurmans
Csaba Szepesvári
11
10
0
11 Feb 2021
Policy Mirror Descent for Reinforcement Learning: Linear Convergence, New Sampling Complexity, and Generalized Problem Classes
Guanghui Lan
87
136
0
30 Jan 2021
Finite Sample Analysis of Two-Time-Scale Natural Actor-Critic Algorithm
S. Khodadadian
Thinh T. Doan
J. Romberg
S. T. Maguluri
25
42
0
26 Jan 2021
Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint
Nithia Vijayan
A. PrashanthL.
OffRL
19
6
0
06 Jan 2021
Towards Understanding Asynchronous Advantage Actor-critic: Convergence and Linear Speedup
Han Shen
K. Zhang
Min-Fong Hong
Tianyi Chen
19
28
0
31 Dec 2020
Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy
Han Zhong
Xun Deng
Ethan X. Fang
Zhuoran Yang
Zhaoran Wang
Runze Li
8
3
0
28 Dec 2020
A New Bandit Setting Balancing Information from State Evolution and Corrupted Context
Alexander Galozy
Sławomir Nowaczyk
Mattias Ohlsson
OffRL
20
2
0
16 Nov 2020
Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View
Christos Thrampoulidis
Samet Oymak
Mahdi Soltanolkotabi
11
41
0
16 Nov 2020
Finding the Near Optimal Policy via Adaptive Reduced Regularization in MDPs
Wenhao Yang
Xiang Li
Guangzeng Xie
Zhihua Zhang
40
5
0
31 Oct 2020
Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime
Andrea Agazzi
Jianfeng Lu
11
15
0
22 Oct 2020
Sample Efficient Reinforcement Learning with REINFORCE
Junzi Zhang
Jongho Kim
Brendan O'Donoghue
Stephen P. Boyd
35
98
0
22 Oct 2020
On The Convergence of First Order Methods for Quasar-Convex Optimization
Jikai Jin
14
9
0
10 Oct 2020
Beyond variance reduction: Understanding the true impact of baselines on policy optimization
Wesley Chung
Valentin Thomas
Marlos C. Machado
Nicolas Le Roux
OffRL
6
22
0
31 Aug 2020
Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy
Zuyue Fu
Zhuoran Yang
Zhaoran Wang
13
42
0
02 Aug 2020
Approximation Benefits of Policy Gradient Methods with Aggregated States
Daniel Russo
38
7
0
22 Jul 2020
On Linear Convergence of Policy Gradient Methods for Finite MDPs
Jalaj Bhandari
Daniel Russo
53
59
0
21 Jul 2020
A Short Note on Soft-max and Policy Gradients in Bandits Problems
N. Walton
6
1
0
20 Jul 2020
Regret Analysis of a Markov Policy Gradient Algorithm for Multi-arm Bandits
D. Denisov
N. Walton
13
8
0
20 Jul 2020
Variational Policy Gradient Method for Reinforcement Learning with General Utilities
Junyu Zhang
Alec Koppel
Amrit Singh Bedi
Csaba Szepesvári
Mengdi Wang
6
137
0
04 Jul 2020
Meta-Learning Bandit Policies by Gradient Ascent
B. Kveton
Martin Mladenov
Chih-Wei Hsu
Manzil Zaheer
Csaba Szepesvári
Craig Boutilier
30
9
0
09 Jun 2020
A General Framework for Learning Mean-Field Games
Xin Guo
Anran Hu
Renyuan Xu
Junzi Zhang
OffRL
AI4CE
27
49
0
13 Mar 2020
Differentiable Bandit Exploration
Craig Boutilier
Chih-Wei Hsu
B. Kveton
Martin Mladenov
Csaba Szepesvári
Manzil Zaheer
BDL
OffRL
14
7
0
17 Feb 2020
Previous
1
2
3
4