Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2506.09477
Cited By

On a few pitfalls in KL divergence gradient estimation for RL

11 June 2025

ArXiv (abs)PDF HTML

Papers citing "On a few pitfalls in KL divergence gradient estimation for RL"

6 / 6 papers shown

Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective

Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective

119

0

0

03 Dec 2025

KL-Regularized Reinforcement Learning is Designed to Mode Collapse

KL-Regularized Reinforcement Learning is Designed to Mode Collapse

Anthony GX-Chen

Rajesh Ranganath

136

2

0

23 Oct 2025

Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning

Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning

Luckeciano C. Melo

Alessandro Abate

101

0

0

01 Oct 2025

Rethinking Large Language Model Distillation: A Constrained Markov Decision Process Perspective

Rethinking Large Language Model Distillation: A Constrained Markov Decision Process Perspective

Matthieu Zimmer

Haitham Bou-Ammar

89

0

0

26 Sep 2025

Outcome-based Exploration for LLM Reasoning

Outcome-based Exploration for LLM Reasoning

279

30

0

08 Sep 2025

Can Large Reasoning Models Self-Train?

Can Large Reasoning Models Self-Train?

Sheikh Shafayat

Ruslan Salakhutdinov

412

21

0

27 May 2025