ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.08607
  4. Cited By
On the Convergence and Sample Efficiency of Variance-Reduced Policy
  Gradient Method
v1v2 (latest)

On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

Neural Information Processing Systems (NeurIPS), 2021
17 February 2021
Junyu Zhang
Chengzhuo Ni
Zheng Yu
Csaba Szepesvári
Mengdi Wang
ArXiv (abs)PDFHTML

Papers citing "On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method"

47 / 47 papers shown
Bayesian Risk-Sensitive Policy Optimization For MDPs With General Loss Functions
Bayesian Risk-Sensitive Policy Optimization For MDPs With General Loss Functions
Xiaoshuang Wang
Yifan Lin
Enlu Zhou
176
0
0
19 Sep 2025
Online Episodic Convex Reinforcement Learning
Online Episodic Convex Reinforcement Learning
B. Moreno
Khaled Eldowa
Pierre Gaillard
Margaux Brégère
Nadia Oudjane
OffRL
331
0
0
12 May 2025
Robo-taxi Fleet Coordination at Scale via Reinforcement Learning
Robo-taxi Fleet Coordination at Scale via Reinforcement Learning
Luigi Tresca
Carolin Schmidt
James Harrison
Filipe Rodrigues
G. Zardini
Daniele Gammelli
Marco Pavone
398
7
0
08 Apr 2025
Enhancing PPO with Trajectory-Aware Hybrid Policies
Qisai Liu
Zhanhong Jiang
Hsin-Jung Yang
Mahsa Khosravi
Joshua R. Waite
Soumik Sarkar
294
1
0
21 Feb 2025
Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates
Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning ratesNeural Information Processing Systems (NeurIPS), 2025
Jincheng Mei
Bo Dai
Alekh Agarwal
Sharan Vaswani
Anant Raj
Csaba Szepesvári
Dale Schuurmans
374
1
0
11 Feb 2025
Transformer-based Model Predictive Control: Trajectory Optimization via
  Sequence Modeling
Transformer-based Model Predictive Control: Trajectory Optimization via Sequence ModelingIEEE Robotics and Automation Letters (RA-L), 2024
Davide Celestini
Daniele Gammelli
T. Guffanti
Simone DÁmico
Elisa Capello
Marco Pavone
295
34
0
31 Oct 2024
From Gradient Clipping to Normalization for Heavy Tailed SGD
From Gradient Clipping to Normalization for Heavy Tailed SGDInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Florian Hübler
Ilyas Fatkhullin
Niao He
453
39
0
17 Oct 2024
Last-Iterate Convergence of General Parameterized Policies in
  Constrained MDPs
Last-Iterate Convergence of General Parameterized Policies in Constrained MDPs
Washim Uddin Mondal
Vaneet Aggarwal
310
1
0
21 Aug 2024
MetaCURL: Non-stationary Concave Utility Reinforcement Learning
MetaCURL: Non-stationary Concave Utility Reinforcement Learning
B. Moreno
Margaux Brégère
Pierre Gaillard
Nadia Oudjane
OffRL
266
3
0
30 May 2024
Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent
  Baseline
Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent BaselineIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024
Wenjia Meng
Qian Zheng
Long Yang
Yilong Yin
Gang Pan
OffRL
238
0
0
04 May 2024
Global Convergence Guarantees for Federated Policy Gradient Methods with
  Adversaries
Global Convergence Guarantees for Federated Policy Gradient Methods with Adversaries
Swetha Ganesh
Jiayu Chen
Gugan Thoppe
Vaneet Aggarwal
FedML
353
5
0
15 Mar 2024
Taming Nonconvex Stochastic Mirror Descent with General Bregman
  Divergence
Taming Nonconvex Stochastic Mirror Descent with General Bregman Divergence
Ilyas Fatkhullin
Niao He
321
14
0
27 Feb 2024
Stochastic Gradient Succeeds for Bandits
Stochastic Gradient Succeeds for Bandits
Jincheng Mei
Zixin Zhong
Bo Dai
Alekh Agarwal
Csaba Szepesvári
Dale Schuurmans
267
2
0
27 Feb 2024
MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with
  Diverse Human Preferences
MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences
Souradip Chakraborty
Jiahao Qiu
Hui Yuan
Alec Koppel
Furong Huang
Dinesh Manocha
Amrit Singh Bedi
Mengdi Wang
ALM
219
29
0
14 Feb 2024
On the Stochastic (Variance-Reduced) Proximal Gradient Method for
  Regularized Expected Reward Optimization
On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization
Ling Liang
Haizhao Yang
213
1
0
23 Jan 2024
Global Convergence of Natural Policy Gradient with Hessian-aided
  Momentum Variance Reduction
Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance ReductionJournal of Scientific Computing (J. Sci. Comput.), 2024
Jie Feng
Ke Wei
Jinchi Chen
390
4
0
02 Jan 2024
Efficiently Escaping Saddle Points for Policy Optimization
Efficiently Escaping Saddle Points for Policy OptimizationConference on Uncertainty in Artificial Intelligence (UAI), 2023
Sadegh Khorasani
Saber Salehkaleybar
Negar Kiyavash
Niao He
Matthias Grossglauser
284
1
0
15 Nov 2023
Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm
  with General Parameterization for Infinite Horizon Discounted Reward Markov
  Decision Processes
Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision ProcessesInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023
Washim Uddin Mondal
Vaneet Aggarwal
282
20
0
18 Oct 2023
Regret Analysis of Policy Gradient Algorithm for Infinite Horizon
  Average Reward Markov Decision Processes
Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision ProcessesAAAI Conference on Artificial Intelligence (AAAI), 2023
Qinbo Bai
Washim Uddin Mondal
Vaneet Aggarwal
349
22
0
05 Sep 2023
An Adaptive Optimization Approach to Personalized Financial Incentives
  in Mobile Behavioral Weight Loss Interventions
An Adaptive Optimization Approach to Personalized Financial Incentives in Mobile Behavioral Weight Loss Interventions
Qiaomei Li
Kara L. Gavin
Corrine L. Voils
Yonatan Dov Mintz
236
1
0
01 Jul 2023
Reinforcement Learning with General Utilities: Simpler Variance
  Reduction and Large State-Action Space
Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action SpaceInternational Conference on Machine Learning (ICML), 2023
Anas Barakat
Ilyas Fatkhullin
Niao He
240
17
0
02 Jun 2023
Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with
  General Utilities
Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with General UtilitiesNeural Information Processing Systems (NeurIPS), 2023
Donghao Ying
Yunkai Zhang
Yuhao Ding
Alec Koppel
Javad Lavaei
387
22
0
27 May 2023
Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs
  with Short Burn-In Time
Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In TimeNeural Information Processing Systems (NeurIPS), 2023
Xiang Ji
Gen Li
OffRL
391
8
0
24 May 2023
Instruction Tuned Models are Quick Learners
Instruction Tuned Models are Quick Learners
Himanshu Gupta
Saurabh Arjun Sawant
Swaroop Mishra
Mutsumi Nakamura
Arindam Mitra
Santosh Mashetty
Chitta Baral
298
30
0
17 May 2023
Scalable Multi-Agent Reinforcement Learning with General Utilities
Scalable Multi-Agent Reinforcement Learning with General UtilitiesAmerican Control Conference (ACC), 2023
Donghao Ying
Yuhao Ding
Alec Koppel
Javad Lavaei
250
2
0
15 Feb 2023
Stochastic Policy Gradient Methods: Improved Sample Complexity for
  Fisher-non-degenerate Policies
Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate PoliciesInternational Conference on Machine Learning (ICML), 2023
Ilyas Fatkhullin
Anas Barakat
Anastasia Kireeva
Niao He
439
57
0
03 Feb 2023
A Novel Framework for Policy Mirror Descent with General
  Parameterization and Linear Convergence
A Novel Framework for Policy Mirror Descent with General Parameterization and Linear ConvergenceNeural Information Processing Systems (NeurIPS), 2023
Carlo Alfano
Rui Yuan
Patrick Rebeschini
632
22
0
30 Jan 2023
Stochastic Dimension-reduced Second-order Methods for Policy
  Optimization
Stochastic Dimension-reduced Second-order Methods for Policy Optimization
Jinsong Liu
Chen Xie
Qinwen Deng
Dongdong Ge
Yi-Li Ye
126
1
0
28 Jan 2023
The Role of Baselines in Policy Gradient Optimization
The Role of Baselines in Policy Gradient OptimizationNeural Information Processing Systems (NeurIPS), 2023
Jincheng Mei
Wesley Chung
Valentin Thomas
Bo Dai
Csaba Szepesvári
Dale Schuurmans
280
26
0
16 Jan 2023
Variance-Reduced Conservative Policy Iteration
Variance-Reduced Conservative Policy IterationInternational Conference on Algorithmic Learning Theory (ALT), 2022
Naman Agarwal
Brian Bullins
Karan Singh
219
3
0
12 Dec 2022
SoftTreeMax: Policy Gradient with Tree Search
SoftTreeMax: Policy Gradient with Tree Search
Gal Dalal
Assaf Hallak
Shie Mannor
Gal Chechik
168
1
0
28 Sep 2022
On the Reuse Bias in Off-Policy Reinforcement Learning
On the Reuse Bias in Off-Policy Reinforcement LearningInternational Joint Conference on Artificial Intelligence (IJCAI), 2022
Chengyang Ying
Zhongkai Hao
Xinning Zhou
Hang Su
Dong Yan
Jun Zhu
OffRL
237
5
0
15 Sep 2022
Achieving Zero Constraint Violation for Constrained Reinforcement
  Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm
Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual AlgorithmAAAI Conference on Artificial Intelligence (AAAI), 2022
Qinbo Bai
Amrit Singh Bedi
Vaneet Aggarwal
268
27
0
12 Jun 2022
Stochastic Second-Order Methods Improve Best-Known Sample Complexity of
  SGD for Gradient-Dominated Function
Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated FunctionNeural Information Processing Systems (NeurIPS), 2022
Saeed Masiha
Saber Salehkaleybar
Niao He
Negar Kiyavash
Patrick Thiran
357
21
0
25 May 2022
Momentum-Based Policy Gradient with Second-Order Information
Momentum-Based Policy Gradient with Second-Order Information
Saber Salehkaleybar
Sadegh Khorasani
Negar Kiyavash
Niao He
Patrick Thiran
320
13
0
17 May 2022
PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method
  with Probabilistic Gradient Estimation
PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient EstimationInternational Conference on Machine Learning (ICML), 2022
Matilde Gargiani
Andrea Zanelli
Andrea Martinelli
Tyler H. Summers
John Lygeros
168
17
0
01 Feb 2022
Optimal Estimation of Off-Policy Policy Gradient via Double Fitted
  Iteration
Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration
Chengzhuo Ni
Ruiqi Zhang
Xiang Ji
Xuezhou Zhang
Mengdi Wang
OffRL
358
1
0
31 Jan 2022
MDPGT: Momentum-based Decentralized Policy Gradient Tracking
MDPGT: Momentum-based Decentralized Policy Gradient TrackingAAAI Conference on Artificial Intelligence (AAAI), 2021
Zhanhong Jiang
Xian Yeow Lee
Sin Yong Tan
Kai Liang Tan
Aditya Balu
Young M. Lee
Chinmay Hegde
Soumik Sarkar
205
11
0
06 Dec 2021
Convergence and Optimality of Policy Gradient Methods in Weakly Smooth
  Settings
Convergence and Optimality of Policy Gradient Methods in Weakly Smooth SettingsAAAI Conference on Artificial Intelligence (AAAI), 2021
Matthew Shunshi Zhang
Murat A. Erdogdu
Animesh Garg
417
6
0
30 Oct 2021
Understanding the Effect of Stochasticity in Policy Optimization
Understanding the Effect of Stochasticity in Policy OptimizationNeural Information Processing Systems (NeurIPS), 2021
Jincheng Mei
Bo Dai
Chenjun Xiao
Csaba Szepesvári
Dale Schuurmans
255
20
0
29 Oct 2021
Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy
  Gradient Methods with Entropy Regularization
Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization
Yuhao Ding
Junzi Zhang
Hyunin Lee
Javad Lavaei
471
23
0
19 Oct 2021
On the Global Optimum Convergence of Momentum-based Policy Gradient
On the Global Optimum Convergence of Momentum-based Policy Gradient
Yuhao Ding
Junzi Zhang
Javad Lavaei
365
26
0
19 Oct 2021
Concave Utility Reinforcement Learning with Zero-Constraint Violations
Concave Utility Reinforcement Learning with Zero-Constraint Violations
Mridul Agarwal
Qinbo Bai
Vaneet Aggarwal
374
15
0
12 Sep 2021
A general sample complexity analysis of vanilla policy gradient
A general sample complexity analysis of vanilla policy gradientInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2021
Rui Yuan
Robert Mansel Gower
A. Lazaric
487
87
0
23 Jul 2021
Bregman Gradient Policy Optimization
Bregman Gradient Policy Optimization
Feihu Huang
Shangqian Gao
Heng-Chiao Huang
480
19
0
23 Jun 2021
Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm
Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based AlgorithmJournal of Artificial Intelligence Research (JAIR), 2021
Qinbo Bai
Mridul Agarwal
Vaneet Aggarwal
130
8
0
28 May 2021
Policy Mirror Descent for Regularized Reinforcement Learning: A
  Generalized Framework with Linear Convergence
Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear ConvergenceSIAM Journal on Optimization (SIAM J. Optim.), 2021
Wenhao Zhan
Shicong Cen
Baihe Huang
Yuxin Chen
Jason D. Lee
Yuejie Chi
393
92
0
24 May 2021
1
Page 1 of 1