Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2102.08607
Cited By
v1
v2 (latest)
On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method
Neural Information Processing Systems (NeurIPS), 2021
17 February 2021
Junyu Zhang
Chengzhuo Ni
Zheng Yu
Csaba Szepesvári
Mengdi Wang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method"
47 / 47 papers shown
Bayesian Risk-Sensitive Policy Optimization For MDPs With General Loss Functions
Xiaoshuang Wang
Yifan Lin
Enlu Zhou
176
0
0
19 Sep 2025
Online Episodic Convex Reinforcement Learning
B. Moreno
Khaled Eldowa
Pierre Gaillard
Margaux Brégère
Nadia Oudjane
OffRL
331
0
0
12 May 2025
Robo-taxi Fleet Coordination at Scale via Reinforcement Learning
Luigi Tresca
Carolin Schmidt
James Harrison
Filipe Rodrigues
G. Zardini
Daniele Gammelli
Marco Pavone
398
7
0
08 Apr 2025
Enhancing PPO with Trajectory-Aware Hybrid Policies
Qisai Liu
Zhanhong Jiang
Hsin-Jung Yang
Mahsa Khosravi
Joshua R. Waite
Soumik Sarkar
294
1
0
21 Feb 2025
Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates
Neural Information Processing Systems (NeurIPS), 2025
Jincheng Mei
Bo Dai
Alekh Agarwal
Sharan Vaswani
Anant Raj
Csaba Szepesvári
Dale Schuurmans
374
1
0
11 Feb 2025
Transformer-based Model Predictive Control: Trajectory Optimization via Sequence Modeling
IEEE Robotics and Automation Letters (RA-L), 2024
Davide Celestini
Daniele Gammelli
T. Guffanti
Simone DÁmico
Elisa Capello
Marco Pavone
295
34
0
31 Oct 2024
From Gradient Clipping to Normalization for Heavy Tailed SGD
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Florian Hübler
Ilyas Fatkhullin
Niao He
453
39
0
17 Oct 2024
Last-Iterate Convergence of General Parameterized Policies in Constrained MDPs
Washim Uddin Mondal
Vaneet Aggarwal
310
1
0
21 Aug 2024
MetaCURL: Non-stationary Concave Utility Reinforcement Learning
B. Moreno
Margaux Brégère
Pierre Gaillard
Nadia Oudjane
OffRL
266
3
0
30 May 2024
Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024
Wenjia Meng
Qian Zheng
Long Yang
Yilong Yin
Gang Pan
OffRL
238
0
0
04 May 2024
Global Convergence Guarantees for Federated Policy Gradient Methods with Adversaries
Swetha Ganesh
Jiayu Chen
Gugan Thoppe
Vaneet Aggarwal
FedML
353
5
0
15 Mar 2024
Taming Nonconvex Stochastic Mirror Descent with General Bregman Divergence
Ilyas Fatkhullin
Niao He
321
14
0
27 Feb 2024
Stochastic Gradient Succeeds for Bandits
Jincheng Mei
Zixin Zhong
Bo Dai
Alekh Agarwal
Csaba Szepesvári
Dale Schuurmans
267
2
0
27 Feb 2024
MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences
Souradip Chakraborty
Jiahao Qiu
Hui Yuan
Alec Koppel
Furong Huang
Dinesh Manocha
Amrit Singh Bedi
Mengdi Wang
ALM
219
29
0
14 Feb 2024
On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization
Ling Liang
Haizhao Yang
213
1
0
23 Jan 2024
Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance Reduction
Journal of Scientific Computing (J. Sci. Comput.), 2024
Jie Feng
Ke Wei
Jinchi Chen
390
4
0
02 Jan 2024
Efficiently Escaping Saddle Points for Policy Optimization
Conference on Uncertainty in Artificial Intelligence (UAI), 2023
Sadegh Khorasani
Saber Salehkaleybar
Negar Kiyavash
Niao He
Matthias Grossglauser
284
1
0
15 Nov 2023
Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision Processes
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
Washim Uddin Mondal
Vaneet Aggarwal
282
20
0
18 Oct 2023
Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes
AAAI Conference on Artificial Intelligence (AAAI), 2023
Qinbo Bai
Washim Uddin Mondal
Vaneet Aggarwal
349
22
0
05 Sep 2023
An Adaptive Optimization Approach to Personalized Financial Incentives in Mobile Behavioral Weight Loss Interventions
Qiaomei Li
Kara L. Gavin
Corrine L. Voils
Yonatan Dov Mintz
236
1
0
01 Jul 2023
Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space
International Conference on Machine Learning (ICML), 2023
Anas Barakat
Ilyas Fatkhullin
Niao He
240
17
0
02 Jun 2023
Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with General Utilities
Neural Information Processing Systems (NeurIPS), 2023
Donghao Ying
Yunkai Zhang
Yuhao Ding
Alec Koppel
Javad Lavaei
387
22
0
27 May 2023
Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time
Neural Information Processing Systems (NeurIPS), 2023
Xiang Ji
Gen Li
OffRL
391
8
0
24 May 2023
Instruction Tuned Models are Quick Learners
Himanshu Gupta
Saurabh Arjun Sawant
Swaroop Mishra
Mutsumi Nakamura
Arindam Mitra
Santosh Mashetty
Chitta Baral
298
30
0
17 May 2023
Scalable Multi-Agent Reinforcement Learning with General Utilities
American Control Conference (ACC), 2023
Donghao Ying
Yuhao Ding
Alec Koppel
Javad Lavaei
250
2
0
15 Feb 2023
Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies
International Conference on Machine Learning (ICML), 2023
Ilyas Fatkhullin
Anas Barakat
Anastasia Kireeva
Niao He
439
57
0
03 Feb 2023
A Novel Framework for Policy Mirror Descent with General Parameterization and Linear Convergence
Neural Information Processing Systems (NeurIPS), 2023
Carlo Alfano
Rui Yuan
Patrick Rebeschini
632
22
0
30 Jan 2023
Stochastic Dimension-reduced Second-order Methods for Policy Optimization
Jinsong Liu
Chen Xie
Qinwen Deng
Dongdong Ge
Yi-Li Ye
126
1
0
28 Jan 2023
The Role of Baselines in Policy Gradient Optimization
Neural Information Processing Systems (NeurIPS), 2023
Jincheng Mei
Wesley Chung
Valentin Thomas
Bo Dai
Csaba Szepesvári
Dale Schuurmans
280
26
0
16 Jan 2023
Variance-Reduced Conservative Policy Iteration
International Conference on Algorithmic Learning Theory (ALT), 2022
Naman Agarwal
Brian Bullins
Karan Singh
219
3
0
12 Dec 2022
SoftTreeMax: Policy Gradient with Tree Search
Gal Dalal
Assaf Hallak
Shie Mannor
Gal Chechik
168
1
0
28 Sep 2022
On the Reuse Bias in Off-Policy Reinforcement Learning
International Joint Conference on Artificial Intelligence (IJCAI), 2022
Chengyang Ying
Zhongkai Hao
Xinning Zhou
Hang Su
Dong Yan
Jun Zhu
OffRL
237
5
0
15 Sep 2022
Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm
AAAI Conference on Artificial Intelligence (AAAI), 2022
Qinbo Bai
Amrit Singh Bedi
Vaneet Aggarwal
268
27
0
12 Jun 2022
Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Function
Neural Information Processing Systems (NeurIPS), 2022
Saeed Masiha
Saber Salehkaleybar
Niao He
Negar Kiyavash
Patrick Thiran
357
21
0
25 May 2022
Momentum-Based Policy Gradient with Second-Order Information
Saber Salehkaleybar
Sadegh Khorasani
Negar Kiyavash
Niao He
Patrick Thiran
320
13
0
17 May 2022
PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation
International Conference on Machine Learning (ICML), 2022
Matilde Gargiani
Andrea Zanelli
Andrea Martinelli
Tyler H. Summers
John Lygeros
168
17
0
01 Feb 2022
Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration
Chengzhuo Ni
Ruiqi Zhang
Xiang Ji
Xuezhou Zhang
Mengdi Wang
OffRL
358
1
0
31 Jan 2022
MDPGT: Momentum-based Decentralized Policy Gradient Tracking
AAAI Conference on Artificial Intelligence (AAAI), 2021
Zhanhong Jiang
Xian Yeow Lee
Sin Yong Tan
Kai Liang Tan
Aditya Balu
Young M. Lee
Chinmay Hegde
Soumik Sarkar
205
11
0
06 Dec 2021
Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings
AAAI Conference on Artificial Intelligence (AAAI), 2021
Matthew Shunshi Zhang
Murat A. Erdogdu
Animesh Garg
417
6
0
30 Oct 2021
Understanding the Effect of Stochasticity in Policy Optimization
Neural Information Processing Systems (NeurIPS), 2021
Jincheng Mei
Bo Dai
Chenjun Xiao
Csaba Szepesvári
Dale Schuurmans
255
20
0
29 Oct 2021
Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization
Yuhao Ding
Junzi Zhang
Hyunin Lee
Javad Lavaei
471
23
0
19 Oct 2021
On the Global Optimum Convergence of Momentum-based Policy Gradient
Yuhao Ding
Junzi Zhang
Javad Lavaei
365
26
0
19 Oct 2021
Concave Utility Reinforcement Learning with Zero-Constraint Violations
Mridul Agarwal
Qinbo Bai
Vaneet Aggarwal
374
15
0
12 Sep 2021
A general sample complexity analysis of vanilla policy gradient
International Conference on Artificial Intelligence and Statistics (AISTATS), 2021
Rui Yuan
Robert Mansel Gower
A. Lazaric
487
87
0
23 Jul 2021
Bregman Gradient Policy Optimization
Feihu Huang
Shangqian Gao
Heng-Chiao Huang
480
19
0
23 Jun 2021
Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm
Journal of Artificial Intelligence Research (JAIR), 2021
Qinbo Bai
Mridul Agarwal
Vaneet Aggarwal
130
8
0
28 May 2021
Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence
SIAM Journal on Optimization (SIAM J. Optim.), 2021
Wenhao Zhan
Shicong Cen
Baihe Huang
Yuxin Chen
Jason D. Lee
Yuejie Chi
393
92
0
24 May 2021
1
Page 1 of 1