ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.09477
  4. Cited By

On a few pitfalls in KL divergence gradient estimation for RL

11 June 2025
Yunhao Tang
Rémi Munos
ArXiv (abs)PDFHTML

Papers citing "On a few pitfalls in KL divergence gradient estimation for RL"

6 / 6 papers shown
Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective
Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective
Jingyang Ou
Jiaqi Han
Minkai Xu
Shaoxuan Xu
Jianwen Xie
Stefano Ermon
Yi Wu
Chongxuan Li
DiffM
119
0
0
03 Dec 2025
KL-Regularized Reinforcement Learning is Designed to Mode Collapse
KL-Regularized Reinforcement Learning is Designed to Mode Collapse
Anthony GX-Chen
Jatin Prakash
Jeff Guo
Rob Fergus
Rajesh Ranganath
136
2
0
23 Oct 2025
Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning
Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning
Luckeciano C. Melo
Alessandro Abate
Yarin Gal
LRM
101
0
0
01 Oct 2025
Rethinking Large Language Model Distillation: A Constrained Markov Decision Process Perspective
Rethinking Large Language Model Distillation: A Constrained Markov Decision Process Perspective
Matthieu Zimmer
Xiaotong Ji
Tu Nguyen
Haitham Bou-Ammar
89
0
0
26 Sep 2025
Outcome-based Exploration for LLM Reasoning
Outcome-based Exploration for LLM Reasoning
Yuda Song
Julia Kempe
Remi Munos
OffRLLRM
279
30
0
08 Sep 2025
Can Large Reasoning Models Self-Train?
Can Large Reasoning Models Self-Train?
Sheikh Shafayat
Fahim Tajwar
Ruslan Salakhutdinov
J. Schneider
Andrea Zanette
ReLMOffRLLRM
412
21
0
27 May 2025
1