Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2505.22617
Cited By
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
28 May 2025
Ganqu Cui
Yuchen Zhang
Jiacheng Chen
Lifan Yuan
Zhi Wang
Yuxin Zuo
Haozhan Li
Wendi Li
Huayu Chen
Weize Chen
Zhiyuan Liu
Yuan Yao
Lei Bai
Wanli Ouyang
Yu Cheng
Bowen Zhou
Ning Ding
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (125 upvotes)
Papers citing
"The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models"
50 / 159 papers shown
Rectifying LLM Thought from Lens of Optimization
J. Liu
Hongwei Liu
Songyang Zhang
Kai Chen
LRM
128
1
0
01 Dec 2025
Beware of Reasoning Overconfidence: Pitfalls in the Reasoning Process for Multi-solution Tasks
Jiannan Guan
Qiguang Chen
L. Qin
Dengyun Peng
Jinhao Liu
Liangyu Huo
Jian Xie
Wanxiang Che
LRM
156
0
0
01 Dec 2025
Beyond High-Entropy Exploration: Correctness-Aware Low-Entropy Segment-Based Advantage Shaping for Reasoning LLMs
Xinzhu Chen
Xuesheng Li
Zhongxiang Sun
Weijie Yu
LRM
108
1
0
30 Nov 2025
G-KV: Decoding-Time KV Cache Eviction with Global Attention
Mengqi Liao
Lu Wang
Chaoyun Zhang
Zekai Shen
Xiaowei Mao
Si Qin
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
Huaiyu Wan
78
0
0
29 Nov 2025
Differential Smoothing Mitigates Sharpening and Improves LLM Reasoning
Jingchu Gai
Guanning Zeng
Huaqing Zhang
Aditi Raghunathan
110
0
0
25 Nov 2025
Syn-GRPO: Self-Evolving Data Synthesis for MLLM Perception Reasoning
Qihan Huang
H. Zhang
Rong Wei
Yi Wang
Rui Tang
Mingli Song
Jie Song
134
0
0
24 Nov 2025
EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control
Kai Yang
Xin Xu
Yangkun Chen
Weijie Liu
Jiafei Lyu
Zichuan Lin
Deheng Ye
Saiyong Yang
239
1
0
19 Nov 2025
P1: Mastering Physics Olympiads with Reinforcement Learning
Jiacheng Chen
Qianjia Cheng
F. Yu
Haiyuan Wan
Yuchen Zhang
...
Yu Cheng
Ning Ding
Bowen Zhou
Peng Ye
Ganqu Cui
ReLM
LRM
AI4CE
334
1
0
17 Nov 2025
From Exploration to Exploitation: A Two-Stage Entropy RLVR Approach for Noise-Tolerant MLLM Training
Donglai Xu
Hongzheng Yang
Yuzhi Zhao
Pingping Zhang
Jinpeng Chen
...
Xiaolei Li
Senkang Hu
Ziyi Guan
Jason Chun Lok Li
L. Po
142
0
0
11 Nov 2025
FLEX: Continuous Agent Evolution via Forward Learning from Experience
Zhicheng Cai
Xinyuan Guo
Yu Pei
Jiangtao Feng
Jiangjie Chen
Ya Zhang
Wei-Ying Ma
Mingxuan Wang
Hao Zhou
Hao Zhou
CLL
LLMAG
LRM
290
6
0
09 Nov 2025
What Makes Reasoning Invalid: Echo Reflection Mitigation for Large Language Models
Chen He
Xun Jiang
Lei Wang
Hao-ran Yang
Chong Peng
Peng Yan
Fumin Shen
Xing Xu
LRM
237
0
0
09 Nov 2025
Revisiting Entropy in Reinforcement Learning for Large Reasoning Models
Renren Jin
Pengzhi Gao
Yuqi Ren
Zhuowen Han
Tongxuan Zhang
Wuwei Huang
Wei Liu
Jian Luan
Deyi Xiong
LRM
127
1
0
08 Nov 2025
RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization
Zeng Zhiyuan
Jiashuo Liu
Zhangyue Yin
Ge Zhang
Wenhao Huang
Xipeng Qiu
159
0
0
06 Nov 2025
Explore Data Left Behind in Reinforcement Learning for Reasoning Language Models
Chenxi Liu
Junjie Liang
Yuqi Jia
Bochuan Cao
Yang Bai
Heng-Chiao Huang
Xun Chen
OffRL
ReLM
LRM
299
2
0
06 Nov 2025
Reg-DPO: SFT-Regularized Direct Preference Optimization with GT-Pair for Improving Video Generation
Jie Du
Xinyu Gong
Qingshan Tan
W. Li
Yangming Cheng
Weitao Wang
Chenlu Zhan
Suhui Wu
H. Zhang
J. Zhang
VGen
371
0
0
03 Nov 2025
Efficient Reinforcement Learning for Large Language Models with Intrinsic Exploration
Yan Sun
Jia Guo
Stanley Kok
Zihao Wang
ZuJie Wen
Zhiqiang Zhang
OffRL
LRM
169
0
0
02 Nov 2025
Do Math Reasoning LLMs Help Predict the Impact of Public Transit Events?
Bowen Fang
Ruijian Zha
Xuan Di
AI4TS
158
0
0
02 Nov 2025
Towards Understanding Self-play for LLM Reasoning
Justin Yang Chae
Md Tanvirul Alam
Nidhi Rastogi
ReLM
LRM
384
2
0
31 Oct 2025
Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning
Md Tanvirul Alam
Nidhi Rastogi
OffRL
LRM
113
2
0
30 Oct 2025
BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning
Qianli Shen
Daoyuan Chen
Yilun Huang
Zhenqing Ling
Yaliang Li
Bolin Ding
Jingren Zhou
OffRL
168
0
0
30 Oct 2025
Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error
Chenming Tang
Hsiu-Yuan Huang
Weijie Liu
Saiyong Yang
Yunfang Wu
Yunfang Wu
OffRL
LRM
151
2
0
30 Oct 2025
Kimi Linear: An Expressive, Efficient Attention Architecture
Kimi Team
Yu Zhang
Zongyu Lin
Xingcheng Yao
J. Hu
...
Guokun Lai
Yuxin Wu
Xinyu Zhou
Zhilin Yang
Yulun Du
143
13
0
30 Oct 2025
Defeating the Training-Inference Mismatch via FP16
Penghui Qi
Zichen Liu
Xiangxin Zhou
Tianyu Pang
Chao Du
Wee Sun Lee
Min Lin
174
8
0
30 Oct 2025
The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation
Farid Bagirov
Mikhail Arkhipov
Ksenia Sycheva
Evgeniy Glukhov
Egor Bogomolov
117
0
0
27 Oct 2025
Advantage Shaping as Surrogate Reward Maximization: Unifying Pass@K Policy Gradients
Christos Thrampoulidis
Sadegh Mahdavi
Wenlong Deng
OffRL
197
0
0
27 Oct 2025
BoundRL: Efficient Structured Text Segmentation through Reinforced Boundary Generation
Haoyuan Li
Zhengyuan Shen
Sullam Jeoung
Yueyan Chen
Jiayu Li
Qi Zhu
Shuai Wang
V. Ioannidis
Huzefa Rangwala
181
0
0
23 Oct 2025
KL-Regularized Reinforcement Learning is Designed to Mode Collapse
Anthony GX-Chen
Jatin Prakash
Jeff Guo
Rob Fergus
Rajesh Ranganath
140
2
0
23 Oct 2025
GAPO: Robust Advantage Estimation for Real-World Code LLMs
Jianqing Zhang
Zhezheng Hao
Wei Xia
Hande Dong
Hong Wang
Chenxing Wei
Yuyan Zhou
Yubin Qi
Qiang Lin
Jian Cao
245
0
0
22 Oct 2025
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
Zhiheng Xi
Xin Guo
Yang Nan
Enyu Zhou
Junrui Shen
...
Rui Zheng
Hang Yan
Tao Gui
Qi Zhang
Xuanjing Huang
OffRL
183
8
0
21 Oct 2025
Online SFT for LLM Reasoning: Surprising Effectiveness of Self-Tuning without Rewards
Mengqi Li
Lei Zhao
Anthony Man-Cho So
Ruoyu Sun
Xiao Li
ReLM
OffRL
LRM
178
1
0
21 Oct 2025
Local Coherence or Global Validity? Investigating RLVR Traces in Math Domains
Soumya Rani Samineni
Durgesh Kalwar
Vardaan Gangal
Siddhant Bhambri
Subbarao Kambhampati
LRM
93
0
0
20 Oct 2025
The Road Less Traveled: Enhancing Exploration in LLMs via Sequential Sampling
Shijia Kang
Muhan Zhang
LRM
109
0
0
17 Oct 2025
Soundness-Aware Level: A Microscopic Signature that Predicts LLM Reasoning Potential
Xuansheng Wu
Xiaoman Pan
Wenlin Yao
Jianshu Chen
ReLM
LRM
157
0
0
17 Oct 2025
SimKO: Simple Pass@K Policy Optimization
Ruotian Peng
Yi Ren
Zhouliang Yu
Weiyang Liu
Yandong Wen
225
2
0
16 Oct 2025
The Art of Scaling Reinforcement Learning Compute for LLMs
Devvrit Khatri
Lovish Madaan
Rishabh Tiwari
Rachit Bansal
Sai Surya Duvvuri
Manzil Zaheer
Inderjit Dhillon
David Brandfonbrener
Rishabh Agarwal
OffRL
153
15
0
15 Oct 2025
Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization
Yang Li
Z. Dong
Yuhan Sun
Weixun Wang
Shaopan Xiong
...
Han Lu
Jiamang Wang
Wenbo Su
Bo Zheng
Junchi Yan
LRM
113
4
0
15 Oct 2025
DeepPlanner: Scaling Planning Capability for Deep Research Agents via Advantage Shaping
Wei Fan
Wenlin Yao
Zheng Li
Feng Yao
Xin Liu
Liang Qiu
Qingyu Yin
Yangqiu Song
Bing Yin
LLMAG
OffRL
137
1
0
14 Oct 2025
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Wei Huang
Y. Ge
S. Yang
Yicheng Xiao
Huizi Mao
...
Hongxu Yin
Yao Lu
Xiaojuan Qi
Song Han
Yukang Chen
OffRL
114
0
0
13 Oct 2025
Demystifying Reinforcement Learning in Agentic Reasoning
Zhaochen Yu
Ling Yang
Jiaru Zou
Shuicheng Yan
Mengdi Wang
AI4TS
LRM
269
6
0
13 Oct 2025
MATH-Beyond: A Benchmark for RL to Expand Beyond the Base Model
Prasanna Mayilvahanan
Ricardo Dominguez-Olmedo
Thaddäus Wiedemer
Wieland Brendel
OffRL
AIMat
ReLM
LRM
207
1
0
13 Oct 2025
Rediscovering Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning
Xiaoyun Zhang
Xiaojian Yuan
Di Huang
Wang You
Chen-Hao Hu
Jingqing Ruan
Kejiang Chen
Xing Hu
LRM
198
0
0
13 Oct 2025
From <Answer> to <Think>: Multidimensional Supervision of Reasoning Process for LLM Optimization
Beining Wang
Weihang Su
Hongtao Tian
Tao Yang
Yujia Zhou
Ting Yao
Qingyao Ai
Yiqun Liu
LRM
105
0
0
13 Oct 2025
Unlocking Exploration in RLVR: Uncertainty-aware Advantage Shaping for Deeper Reasoning
Can Xie
Ruotong Pan
Xiangyu Wu
Y. Zhang
Jiayi Fu
Tingting Gao
G. Zhou
OffRL
LRM
151
3
0
12 Oct 2025
One4Many-StablePacker: An Efficient Deep Reinforcement Learning Framework for the 3D Bin Packing Problem
Lei Gao
Shihong Huang
Shengjie Wang
Hong Ma
Feng Zhang
Hengda Bao
Qichang Chen
Weihua Zhou
OffRL
102
0
0
11 Oct 2025
Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective
Zhezheng Hao
Hong Wang
Haoyang Liu
Jian Luo
Jiarui Yu
Hande Dong
Qiang Lin
Can Wang
Jiawei Chen
AAML
99
7
0
11 Oct 2025
RLFR: Extending Reinforcement Learning for LLMs with Flow Environment
Jinghao Zhang
Naishan Zheng
Ruilin Li
Dongzhou Cheng
Zheming Liang
Feng Zhao
Jiaqi Wang
154
0
0
11 Oct 2025
Beyond Surface Reasoning: Unveiling the True Long Chain-of-Thought Capacity of Diffusion Large Language Models
Qiguang Chen
Hanjing Li
L. Qin
Dengyun Peng
Jinhao Liu
Jiangyi Wang
Chengyue Wu
Xie Chen
Yantao Du
Wanxiang Che
LRM
AI4CE
183
1
0
10 Oct 2025
DSPO: Stable and Efficient Policy Optimization for Agentic Search and Reasoning
Chenyang Gu
Yewen Pu
Bruce Yang
Xiaofan Li
Huan Gao
213
0
0
10 Oct 2025
Pinpointing crucial steps: Attribution-based Credit Assignment for Verifiable Reinforcement Learning
Junxi Yin
Haisen Luo
Zhenyu Li
Yihua Liu
Dan Liu
Zequn Li
Xiaohang Xu
108
0
0
10 Oct 2025
Mobile Gamer Lifetime Value Prediction via Objective Decomposition and Reconstruction
Tianwei Li
Yu Zhao
Yunze Li
Sheng Li
126
0
0
09 Oct 2025
1
2
3
4
Next
Page 1 of 4