ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.01491
  4. Cited By
Sample Complexity of Policy Gradient Finding Second-Order Stationary
  Points

Sample Complexity of Policy Gradient Finding Second-Order Stationary Points

AAAI Conference on Artificial Intelligence (AAAI), 2020
2 December 2020
Long Yang
Qian Zheng
Gang Pan
ArXiv (abs)PDFHTML

Papers citing "Sample Complexity of Policy Gradient Finding Second-Order Stationary Points"

17 / 17 papers shown
A Communication-Efficient Decentralized Actor-Critic Algorithm
A Communication-Efficient Decentralized Actor-Critic Algorithm
Xiaoxing Ren
Nicola Bastianello
Thomas Parisini
Andreas A. Malikopoulos
155
0
0
22 Oct 2025
Policy Newton methods for Distortion Riskmetrics
Policy Newton methods for Distortion Riskmetrics
Soumen Pachal
Mizhaan Prajit Maniyar
Prashanth L.A.
154
0
0
10 Aug 2025
Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent
  Baseline
Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent BaselineIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024
Wenjia Meng
Qian Zheng
Long Yang
Yilong Yin
Gang Pan
OffRL
250
0
0
04 May 2024
Efficiently Escaping Saddle Points for Policy Optimization
Efficiently Escaping Saddle Points for Policy OptimizationConference on Uncertainty in Artificial Intelligence (UAI), 2023
Sadegh Khorasani
Saber Salehkaleybar
Negar Kiyavash
Niao He
Matthias Grossglauser
334
1
0
15 Nov 2023
On the Second-Order Convergence of Biased Policy Gradient Algorithms
On the Second-Order Convergence of Biased Policy Gradient AlgorithmsInternational Conference on Machine Learning (ICML), 2023
Siqiao Mu
Diego Klabjan
485
4
0
05 Nov 2023
Controlling Type Confounding in Ad Hoc Teamwork with Instance-wise
  Teammate Feedback Rectification
Controlling Type Confounding in Ad Hoc Teamwork with Instance-wise Teammate Feedback RectificationInternational Conference on Machine Learning (ICML), 2023
Dong Xing
Pengjie Gu
Qian Zheng
Xinrun Wang
Shanqi Liu
Longtao Zheng
Bo An
Gang Pan
175
4
0
19 Jun 2023
A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning
A Cubic-regularized Policy Newton Algorithm for Reinforcement LearningInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023
Mizhaan Prajit Maniyar
Akash Mondal
Prashanth L.A.
S. Bhatnagar
246
5
0
21 Apr 2023
Stochastic Policy Gradient Methods: Improved Sample Complexity for
  Fisher-non-degenerate Policies
Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate PoliciesInternational Conference on Machine Learning (ICML), 2023
Ilyas Fatkhullin
Anas Barakat
Anastasia Kireeva
Niao He
461
60
0
03 Feb 2023
Stochastic Dimension-reduced Second-order Methods for Policy
  Optimization
Stochastic Dimension-reduced Second-order Methods for Policy Optimization
Jinsong Liu
Chen Xie
Qinwen Deng
Dongdong Ge
Yi-Li Ye
145
1
0
28 Jan 2023
Constrained Update Projection Approach to Safe Policy Optimization
Constrained Update Projection Approach to Safe Policy OptimizationNeural Information Processing Systems (NeurIPS), 2022
Long Yang
Jiaming Ji
Juntao Dai
Linrui Zhang
Binbin Zhou
Pengfei Li
Yaodong Yang
Gang Pan
276
76
0
15 Sep 2022
Stochastic Second-Order Methods Improve Best-Known Sample Complexity of
  SGD for Gradient-Dominated Function
Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated FunctionNeural Information Processing Systems (NeurIPS), 2022
Saeed Masiha
Saber Salehkaleybar
Niao He
Negar Kiyavash
Patrick Thiran
400
21
0
25 May 2022
A Review of Safe Reinforcement Learning: Methods, Theory and
  Applications
A Review of Safe Reinforcement Learning: Methods, Theory and Applications
Shangding Gu
Longyu Yang
Yali Du
Guang Chen
Florian Walter
Jun Wang
Alois C. Knoll
OffRLAI4TS
650
316
0
20 May 2022
TinyLight: Adaptive Traffic Signal Control on Devices with Extremely
  Limited Resources
TinyLight: Adaptive Traffic Signal Control on Devices with Extremely Limited ResourcesInternational Joint Conference on Artificial Intelligence (IJCAI), 2022
Dong Xing
Qian Zheng
Qianhui Liu
Gang Pan
166
12
0
01 May 2022
CUP: A Conservative Update Policy Algorithm for Safe Reinforcement
  Learning
CUP: A Conservative Update Policy Algorithm for Safe Reinforcement Learning
Long Yang
Jiaming Ji
Juntao Dai
Yu Zhang
Pengfei Li
Gang Pan
196
24
0
15 Feb 2022
Sample and Communication-Efficient Decentralized Actor-Critic Algorithms
  with Finite-Time Analysis
Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time AnalysisInternational Conference on Machine Learning (ICML), 2021
Ziyi Chen
Yi Zhou
Rongrong Chen
Shaofeng Zou
356
35
0
08 Sep 2021
A nearly Blackwell-optimal policy gradient method
A nearly Blackwell-optimal policy gradient method
Vektor Dewanto
M. Gallagher
OffRL
282
1
0
28 May 2021
Policy Optimization with Stochastic Mirror Descent
Policy Optimization with Stochastic Mirror DescentAAAI Conference on Artificial Intelligence (AAAI), 2019
Long Yang
Yu Zhang
Gang Zheng
Qian Zheng
Pengfei Li
Jianhang Huang
Jun Wen
Gang Pan
458
37
0
25 Jun 2019
1
Page 1 of 1