Sample Complexity of Policy Gradient Finding Second-Order Stationary Points

AAAI Conference on Artificial Intelligence (AAAI), 2020

2 December 2020

Long Yang

Qian Zheng

Gang Pan

ArXiv (abs)PDF HTML

Papers citing "Sample Complexity of Policy Gradient Finding Second-Order Stationary Points"

17 / 17 papers shown

A Communication-Efficient Decentralized Actor-Critic Algorithm

Xiaoxing Ren

Nicola Bastianello

Thomas Parisini

Andreas A. Malikopoulos

155

22 Oct 2025

Policy Newton methods for Distortion Riskmetrics

Soumen Pachal

Mizhaan Prajit Maniyar

Prashanth L.A.

154

10 Aug 2025

Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent BaselineIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024

Qian Zheng

Gang Pan

250

04 May 2024

Efficiently Escaping Saddle Points for Policy OptimizationConference on Uncertainty in Artificial Intelligence (UAI), 2023

Matthias Grossglauser

334

15 Nov 2023

On the Second-Order Convergence of Biased Policy Gradient AlgorithmsInternational Conference on Machine Learning (ICML), 2023

Siqiao Mu

Diego Klabjan

485

05 Nov 2023

Controlling Type Confounding in Ad Hoc Teamwork with Instance-wise Teammate Feedback RectificationInternational Conference on Machine Learning (ICML), 2023

Qian Zheng

Gang Pan

175

19 Jun 2023

A Cubic-regularized Policy Newton Algorithm for Reinforcement LearningInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023

Mizhaan Prajit Maniyar

Akash Mondal

Prashanth L.A.

S. Bhatnagar

246

21 Apr 2023

Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate PoliciesInternational Conference on Machine Learning (ICML), 2023

461

03 Feb 2023

Stochastic Dimension-reduced Second-order Methods for Policy Optimization

Dongdong Ge

145

28 Jan 2023

Constrained Update Projection Approach to Safe Policy OptimizationNeural Information Processing Systems (NeurIPS), 2022

Long Yang

Jiaming Ji

Juntao Dai

Gang Pan

276

15 Sep 2022

Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated FunctionNeural Information Processing Systems (NeurIPS), 2022

400

25 May 2022

A Review of Safe Reinforcement Learning: Methods, Theory and Applications

Guang Chen

Jun Wang

650

316

20 May 2022

TinyLight: Adaptive Traffic Signal Control on Devices with Extremely Limited ResourcesInternational Joint Conference on Artificial Intelligence (IJCAI), 2022

Dong Xing

Qian Zheng

Qianhui Liu

Gang Pan

166

01 May 2022

CUP: A Conservative Update Policy Algorithm for Safe Reinforcement Learning

Long Yang

Jiaming Ji

Juntao Dai

Yu Zhang

Pengfei Li

Gang Pan

196

15 Feb 2022

Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time AnalysisInternational Conference on Machine Learning (ICML), 2021

356

08 Sep 2021

A nearly Blackwell-optimal policy gradient method

Vektor Dewanto

M. Gallagher

OffRL

282

28 May 2021

Policy Optimization with Stochastic Mirror DescentAAAI Conference on Artificial Intelligence (AAAI), 2019

Qian Zheng

Gang Pan

458

25 Jun 2019