ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.06392
  4. Cited By
On the Global Convergence Rates of Softmax Policy Gradient Methods

On the Global Convergence Rates of Softmax Policy Gradient Methods

13 May 2020
Jincheng Mei
Chenjun Xiao
Csaba Szepesvári
Dale Schuurmans
ArXivPDFHTML

Papers citing "On the Global Convergence Rates of Softmax Policy Gradient Methods"

50 / 185 papers shown
Title
Minimisation of Quasar-Convex Functions Using Random Zeroth-Order Oracles
Minimisation of Quasar-Convex Functions Using Random Zeroth-Order Oracles
Amir Ali Farzin
Yuen-Man Pun
Iman Shames
31
0
0
04 May 2025
Ordering-based Conditions for Global Convergence of Policy Gradient Methods
Ordering-based Conditions for Global Convergence of Policy Gradient Methods
Jincheng Mei
Bo Dai
Alekh Agarwal
Mohammad Ghavamzadeh
Csaba Szepesvári
Dale Schuurmans
55
4
0
02 Apr 2025
Analysis of On-policy Policy Gradient Methods under the Distribution Mismatch
Analysis of On-policy Policy Gradient Methods under the Distribution Mismatch
Weizhen Wang
Jianping He
Xiaoming Duan
32
0
0
28 Mar 2025
Efficient Learning for Entropy-Regularized Markov Decision Processes via Multilevel Monte Carlo
Efficient Learning for Entropy-Regularized Markov Decision Processes via Multilevel Monte Carlo
Matthieu Meunier
C. Reisinger
Yufei Zhang
39
0
0
27 Mar 2025
Larger or Smaller Reward Margins to Select Preferences for Alignment?
Kexin Huang
Junkang Wu
Ziqian Chen
Xue Wang
Jinyang Gao
Bolin Ding
Jiancan Wu
Xiangnan He
X. Wang
42
0
0
25 Feb 2025
Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games
Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games
Tong Yang
Bo Dai
Lin Xiao
Yuejie Chi
OffRL
56
2
0
13 Feb 2025
Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates
Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates
Jincheng Mei
Bo Dai
Alekh Agarwal
Sharan Vaswani
Anant Raj
Csaba Szepesvári
Dale Schuurmans
87
0
0
11 Feb 2025
On Penalty-based Bilevel Gradient Descent Method
On Penalty-based Bilevel Gradient Descent Method
Han Shen
Quan-Wu Xiao
Tianyi Chen
60
50
0
08 Jan 2025
Structure Matters: Dynamic Policy Gradient
Structure Matters: Dynamic Policy Gradient
Sara Klein
Xiangyuan Zhang
Tamer Basar
Simon Weissmann
Leif Döring
35
0
0
07 Nov 2024
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
57
3
0
07 Nov 2024
Embedding Safety into RL: A New Take on Trust Region Methods
Embedding Safety into RL: A New Take on Trust Region Methods
Nikola Milosevic
Johannes Müller
Nico Scherf
20
1
0
05 Nov 2024
Risk-sensitive control as inference with Rényi divergence
Risk-sensitive control as inference with Rényi divergence
Kaito Ito
Kenji Kashima
29
1
0
04 Nov 2024
Improved Sample Complexity for Global Convergence of Actor-Critic
  Algorithms
Improved Sample Complexity for Global Convergence of Actor-Critic Algorithms
Navdeep Kumar
Priyank Agrawal
Giorgia Ramponi
Kfir Y. Levy
Shie Mannor
33
0
0
11 Oct 2024
The Crucial Role of Samplers in Online Direct Preference Optimization
The Crucial Role of Samplers in Online Direct Preference Optimization
Ruizhe Shi
Runlong Zhou
Simon S. Du
53
8
0
29 Sep 2024
Towards Fast Rates for Federated and Multi-Task Reinforcement Learning
Towards Fast Rates for Federated and Multi-Task Reinforcement Learning
Feng Zhu
Robert W. Heath Jr.
Aritra Mitra
35
1
0
09 Sep 2024
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Toshinori Kitamura
Tadashi Kozuno
Wataru Kumagai
Kenta Hoshino
Y. Hosoe
Kazumi Kasaura
Masashi Hamaya
Paavo Parmas
Yutaka Matsuo
70
0
0
29 Aug 2024
Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement
  Learning
Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning
Batuhan Yardim
Niao He
AI4CE
41
5
0
27 Aug 2024
q-exponential family for policy optimization
q-exponential family for policy optimization
Lingwei Zhu
Haseeb Shah
Han Wang
Yukie Nagai
Martha White
OffRL
73
0
0
14 Aug 2024
Maximum Entropy On-Policy Actor-Critic via Entropy Advantage Estimation
Maximum Entropy On-Policy Actor-Critic via Entropy Advantage Estimation
Jean Seong Bjorn Choe
Jong-Kook Kim
38
2
0
25 Jul 2024
Functional Acceleration for Policy Mirror Descent
Functional Acceleration for Policy Mirror Descent
Veronica Chelu
Doina Precup
28
0
0
23 Jul 2024
Last-Iterate Global Convergence of Policy Gradients for Constrained
  Reinforcement Learning
Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning
Alessandro Montenegro
Marco Mussi
Matteo Papini
Alberto Maria Metelli
BDL
38
1
0
15 Jul 2024
Optimizing Novelty of Top-k Recommendations using Large Language Models
  and Reinforcement Learning
Optimizing Novelty of Top-k Recommendations using Large Language Models and Reinforcement Learning
Amit Sharma
Hua Li
Xue Li
Jian Jiao
LRM
29
0
0
20 Jun 2024
A Generalized Version of Chung's Lemma and its Applications
A Generalized Version of Chung's Lemma and its Applications
Li Jiang
Xiao Li
Andre Milzarek
Junwen Qiu
40
1
0
09 Jun 2024
Optimal Rates of Convergence for Entropy Regularization in Discounted Markov Decision Processes
Optimal Rates of Convergence for Entropy Regularization in Discounted Markov Decision Processes
Johannes Muller
Semih Cayci
31
0
0
06 Jun 2024
Towards the Transferability of Rewards Recovered via Regularized Inverse Reinforcement Learning
Towards the Transferability of Rewards Recovered via Regularized Inverse Reinforcement Learning
Andreas Schlaginhaufen
Maryam Kamgarpour
OffRL
23
1
0
03 Jun 2024
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Yan Yang
Bin Gao
Ya-xiang Yuan
36
2
0
30 May 2024
A CMDP-within-online framework for Meta-Safe Reinforcement Learning
A CMDP-within-online framework for Meta-Safe Reinforcement Learning
Vanshaj Khattar
Yuhao Ding
Bilgehan Sel
Javad Lavaei
Ming Jin
OffRL
27
12
0
26 May 2024
Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence
Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence
Minheng Xiao
Xian Yu
Lei Ying
32
2
0
23 May 2024
Almost sure convergence rates of stochastic gradient methods under gradient domination
Almost sure convergence rates of stochastic gradient methods under gradient domination
Simon Weissmann
Sara Klein
Waïss Azizian
Leif Döring
28
3
0
22 May 2024
Fast Two-Time-Scale Stochastic Gradient Method with Applications in Reinforcement Learning
Fast Two-Time-Scale Stochastic Gradient Method with Applications in Reinforcement Learning
Sihan Zeng
Thinh T. Doan
49
5
0
15 May 2024
Linear Convergence of Independent Natural Policy Gradient in Games with
  Entropy Regularization
Linear Convergence of Independent Natural Policy Gradient in Games with Entropy Regularization
Youbang Sun
Tao-Wen Liu
P. R. Kumar
Shahin Shahrampour
37
0
0
04 May 2024
Learning Optimal Deterministic Policies with Stochastic Policy Gradients
Learning Optimal Deterministic Policies with Stochastic Policy Gradients
Alessandro Montenegro
Marco Mussi
Alberto Maria Metelli
Matteo Papini
38
2
0
03 May 2024
Convergence of a model-free entropy-regularized inverse reinforcement learning algorithm
Convergence of a model-free entropy-regularized inverse reinforcement learning algorithm
Titouan Renard
Andreas Schlaginhaufen
Tingting Ni
Maryam Kamgarpour
51
1
0
25 Mar 2024
Towards Global Optimality for Practical Average Reward Reinforcement
  Learning without Mixing Time Oracles
Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles
Bhrij Patel
Wesley A. Suttle
Alec Koppel
Vaneet Aggarwal
Brian M. Sadler
Amrit Singh Bedi
Dinesh Manocha
32
1
0
18 Mar 2024
On the Global Convergence of Policy Gradient in Average Reward Markov
  Decision Processes
On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes
Navdeep Kumar
Yashaswini Murthy
Itai Shufaro
Kfir Y. Levy
R. Srikant
Shie Mannor
34
2
0
11 Mar 2024
Provable Policy Gradient Methods for Average-Reward Markov Potential
  Games
Provable Policy Gradient Methods for Average-Reward Markov Potential Games
Min Cheng
Ruida Zhou
P. R. Kumar
Chao Tian
49
2
0
09 Mar 2024
Stochastic Gradient Succeeds for Bandits
Stochastic Gradient Succeeds for Bandits
Jincheng Mei
Zixin Zhong
Bo Dai
Alekh Agarwal
Csaba Szepesvári
Dale Schuurmans
21
1
0
27 Feb 2024
MENTOR: Guiding Hierarchical Reinforcement Learning with Human Feedback
  and Dynamic Distance Constraint
MENTOR: Guiding Hierarchical Reinforcement Learning with Human Feedback and Dynamic Distance Constraint
Xinglin Zhou
Yifu Yuan
Shaofu Yang
Jianye Hao
27
1
0
22 Feb 2024
Principled Penalty-based Methods for Bilevel Reinforcement Learning and
  RLHF
Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF
Han Shen
Zhuoran Yang
Tianyi Chen
OffRL
32
14
0
10 Feb 2024
On the Complexity of Finite-Sum Smooth Optimization under the
  Polyak-Łojasiewicz Condition
On the Complexity of Finite-Sum Smooth Optimization under the Polyak-Łojasiewicz Condition
Yunyan Bai
Yuxing Liu
Luo Luo
15
0
0
04 Feb 2024
Regularized Q-Learning with Linear Function Approximation
Regularized Q-Learning with Linear Function Approximation
Jiachen Xi
Alfredo Garcia
P. Momcilovic
25
2
0
26 Jan 2024
On the Stochastic (Variance-Reduced) Proximal Gradient Method for
  Regularized Expected Reward Optimization
On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization
Ling Liang
Haizhao Yang
14
0
0
23 Jan 2024
Global Convergence of Natural Policy Gradient with Hessian-aided
  Momentum Variance Reduction
Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance Reduction
Jie Feng
Ke Wei
Jinchi Chen
23
1
0
02 Jan 2024
PPO-Clip Attains Global Optimality: Towards Deeper Understandings of
  Clipping
PPO-Clip Attains Global Optimality: Towards Deeper Understandings of Clipping
Nai-Chieh Huang
Ping-Chun Hsieh
Kuo-Hao Ho
I-Chen Wu
16
8
0
19 Dec 2023
Can Reinforcement Learning support policy makers? A preliminary study
  with Integrated Assessment Models
Can Reinforcement Learning support policy makers? A preliminary study with Integrated Assessment Models
Theodore Wolf
Nantas Nardelli
John Shawe-Taylor
Maria Perez-Ortiz
19
1
0
11 Dec 2023
Onflow: an online portfolio allocation algorithm
Onflow: an online portfolio allocation algorithm
G. Turinici
Pierre Brugiere
13
0
0
08 Dec 2023
Fast Policy Learning for Linear Quadratic Control with Entropy
  Regularization
Fast Policy Learning for Linear Quadratic Control with Entropy Regularization
Xin Guo
Xinyu Li
Renyuan Xu
34
3
0
23 Nov 2023
A Large Deviations Perspective on Policy Gradient Algorithms
A Large Deviations Perspective on Policy Gradient Algorithms
Wouter Jongeneel
Daniel Kuhn
Mengmeng Li
11
1
0
13 Nov 2023
On the Second-Order Convergence of Biased Policy Gradient Algorithms
On the Second-Order Convergence of Biased Policy Gradient Algorithms
Siqiao Mu
Diego Klabjan
35
2
0
05 Nov 2023
Vanishing Gradients in Reinforcement Finetuning of Language Models
Vanishing Gradients in Reinforcement Finetuning of Language Models
Noam Razin
Hattie Zhou
Omid Saremi
Vimal Thilak
Arwen Bradley
Preetum Nakkiran
Josh Susskind
Etai Littwin
10
7
0
31 Oct 2023
1234
Next