Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2505.22617
Cited By

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

28 May 2025

ArXiv (abs)PDF HTML HuggingFace (125 upvotes)

Papers citing "The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models"

50 / 159 papers shown

Rectifying LLM Thought from Lens of Optimization

Rectifying LLM Thought from Lens of Optimization

128

1

0

01 Dec 2025

Beware of Reasoning Overconfidence: Pitfalls in the Reasoning Process for Multi-solution Tasks

156

0

0

01 Dec 2025

Beyond High-Entropy Exploration: Correctness-Aware Low-Entropy Segment-Based Advantage Shaping for Reasoning LLMs

108

1

0

30 Nov 2025

G-KV: Decoding-Time KV Cache Eviction with Global Attention

Saravan Rajmohan

78

0

0

29 Nov 2025

Differential Smoothing Mitigates Sharpening and Improves LLM Reasoning

Differential Smoothing Mitigates Sharpening and Improves LLM Reasoning

Aditi Raghunathan

110

0

0

25 Nov 2025

Syn-GRPO: Self-Evolving Data Synthesis for MLLM Perception Reasoning

Syn-GRPO: Self-Evolving Data Synthesis for MLLM Perception Reasoning

134

0

0

24 Nov 2025

EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control

EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control

239

1

0

19 Nov 2025

P1: Mastering Physics Olympiads with Reinforcement Learning

P1: Mastering Physics Olympiads with Reinforcement Learning

...

334

1

0

17 Nov 2025

From Exploration to Exploitation: A Two-Stage Entropy RLVR Approach for Noise-Tolerant MLLM Training

From Exploration to Exploitation: A Two-Stage Entropy RLVR Approach for Noise-Tolerant MLLM Training

...

Jason Chun Lok Li

142

0

0

11 Nov 2025

FLEX: Continuous Agent Evolution via Forward Learning from Experience

FLEX: Continuous Agent Evolution via Forward Learning from Experience

Hao Zhou

290

6

0

09 Nov 2025

What Makes Reasoning Invalid: Echo Reflection Mitigation for Large Language Models

What Makes Reasoning Invalid: Echo Reflection Mitigation for Large Language Models

237

0

0

09 Nov 2025

Revisiting Entropy in Reinforcement Learning for Large Reasoning Models

Revisiting Entropy in Reinforcement Learning for Large Reasoning Models

127

1

0

08 Nov 2025

RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization

RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization

159

0

0

06 Nov 2025

Explore Data Left Behind in Reinforcement Learning for Reasoning Language Models

Explore Data Left Behind in Reinforcement Learning for Reasoning Language Models

Heng-Chiao Huang

299

2

0

06 Nov 2025

Reg-DPO: SFT-Regularized Direct Preference Optimization with GT-Pair for Improving Video Generation

Reg-DPO: SFT-Regularized Direct Preference Optimization with GT-Pair for Improving Video Generation

371

0

0

03 Nov 2025

Efficient Reinforcement Learning for Large Language Models with Intrinsic Exploration

Efficient Reinforcement Learning for Large Language Models with Intrinsic Exploration

169

0

0

02 Nov 2025

Do Math Reasoning LLMs Help Predict the Impact of Public Transit Events?

Do Math Reasoning LLMs Help Predict the Impact of Public Transit Events?

158

0

0

02 Nov 2025

Towards Understanding Self-play for LLM Reasoning

Towards Understanding Self-play for LLM Reasoning

Justin Yang Chae

Md Tanvirul Alam

384

2

0

31 Oct 2025

Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning

Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning

Md Tanvirul Alam

113

2

0

30 Oct 2025

BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning

BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning

168

0

0

30 Oct 2025

Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error

Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error

Hsiu-Yuan Huang

Yunfang Wu

151

2

0

30 Oct 2025

Kimi Linear: An Expressive, Efficient Attention Architecture

Kimi Linear: An Expressive, Efficient Attention Architecture

...

143

13

0

30 Oct 2025

Defeating the Training-Inference Mismatch via FP16

Defeating the Training-Inference Mismatch via FP16

174

8

0

30 Oct 2025

The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation

The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation

Mikhail Arkhipov

Evgeniy Glukhov

117

0

0

27 Oct 2025

Advantage Shaping as Surrogate Reward Maximization: Unifying Pass@K Policy Gradients

Advantage Shaping as Surrogate Reward Maximization: Unifying Pass@K Policy Gradients

Christos Thrampoulidis

197

0

0

27 Oct 2025

BoundRL: Efficient Structured Text Segmentation through Reinforced Boundary Generation

BoundRL: Efficient Structured Text Segmentation through Reinforced Boundary Generation

Huzefa Rangwala

181

0

0

23 Oct 2025

KL-Regularized Reinforcement Learning is Designed to Mode Collapse

KL-Regularized Reinforcement Learning is Designed to Mode Collapse

Anthony GX-Chen

Rajesh Ranganath

140

2

0

23 Oct 2025

GAPO: Robust Advantage Estimation for Real-World Code LLMs

GAPO: Robust Advantage Estimation for Real-World Code LLMs

245

0

0

22 Oct 2025

BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping

BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping

...

183

8

0

21 Oct 2025

Online SFT for LLM Reasoning: Surprising Effectiveness of Self-Tuning without Rewards

Online SFT for LLM Reasoning: Surprising Effectiveness of Self-Tuning without Rewards

Anthony Man-Cho So

178

1

0

21 Oct 2025

Local Coherence or Global Validity? Investigating RLVR Traces in Math Domains

Local Coherence or Global Validity? Investigating RLVR Traces in Math Domains

Soumya Rani Samineni

Siddhant Bhambri

Subbarao Kambhampati

93

0

0

20 Oct 2025

The Road Less Traveled: Enhancing Exploration in LLMs via Sequential Sampling

The Road Less Traveled: Enhancing Exploration in LLMs via Sequential Sampling

109

0

0

17 Oct 2025

Soundness-Aware Level: A Microscopic Signature that Predicts LLM Reasoning Potential

Soundness-Aware Level: A Microscopic Signature that Predicts LLM Reasoning Potential

157

0

0

17 Oct 2025

SimKO: Simple Pass@K Policy Optimization

SimKO: Simple Pass@K Policy Optimization

225

2

0

16 Oct 2025

The Art of Scaling Reinforcement Learning Compute for LLMs

The Art of Scaling Reinforcement Learning Compute for LLMs

Sai Surya Duvvuri

Inderjit Dhillon

David Brandfonbrener

Rishabh Agarwal

153

15

0

15 Oct 2025

Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization

Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization

...

113

4

0

15 Oct 2025

DeepPlanner: Scaling Planning Capability for Deep Research Agents via Advantage Shaping

DeepPlanner: Scaling Planning Capability for Deep Research Agents via Advantage Shaping

137

1

0

14 Oct 2025

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

...

114

0

0

13 Oct 2025

Demystifying Reinforcement Learning in Agentic Reasoning

Demystifying Reinforcement Learning in Agentic Reasoning

269

6

0

13 Oct 2025

MATH-Beyond: A Benchmark for RL to Expand Beyond the Base Model

MATH-Beyond: A Benchmark for RL to Expand Beyond the Base Model

Prasanna Mayilvahanan

Ricardo Dominguez-Olmedo

Thaddäus Wiedemer

Wieland Brendel

OffRL AIMat ReLM LRM

207

1

0

13 Oct 2025

Rediscovering Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning

Rediscovering Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning

198

0

0

13 Oct 2025

From <Answer> to <Think>: Multidimensional Supervision of Reasoning Process for LLM Optimization

From <Answer> to <Think>: Multidimensional Supervision of Reasoning Process for LLM Optimization

105

0

0

13 Oct 2025

Unlocking Exploration in RLVR: Uncertainty-aware Advantage Shaping for Deeper Reasoning

Unlocking Exploration in RLVR: Uncertainty-aware Advantage Shaping for Deeper Reasoning

151

3

0

12 Oct 2025

One4Many-StablePacker: An Efficient Deep Reinforcement Learning Framework for the 3D Bin Packing Problem

One4Many-StablePacker: An Efficient Deep Reinforcement Learning Framework for the 3D Bin Packing Problem

102

0

0

11 Oct 2025

Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective

Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective

99

7

0

11 Oct 2025

RLFR: Extending Reinforcement Learning for LLMs with Flow Environment

RLFR: Extending Reinforcement Learning for LLMs with Flow Environment

154

0

0

11 Oct 2025

Beyond Surface Reasoning: Unveiling the True Long Chain-of-Thought Capacity of Diffusion Large Language Models

Beyond Surface Reasoning: Unveiling the True Long Chain-of-Thought Capacity of Diffusion Large Language Models

183

1

0

10 Oct 2025

DSPO: Stable and Efficient Policy Optimization for Agentic Search and Reasoning

DSPO: Stable and Efficient Policy Optimization for Agentic Search and Reasoning

213

0

0

10 Oct 2025

Pinpointing crucial steps: Attribution-based Credit Assignment for Verifiable Reinforcement Learning

Pinpointing crucial steps: Attribution-based Credit Assignment for Verifiable Reinforcement Learning

108

0

0

10 Oct 2025

Mobile Gamer Lifetime Value Prediction via Objective Decomposition and Reconstruction

Mobile Gamer Lifetime Value Prediction via Objective Decomposition and Reconstruction

126

0

0

09 Oct 2025

Page 1 of 4