Communities
Connect sessions
AI calendar
Organizations
Contact Sales
Search
Open menu
Home
Papers
2509.04259
Cited By
RL's Razor: Why Online Reinforcement Learning Forgets Less
4 September 2025
Idan Shenfeld
Jyothish Pari
Pulkit Agrawal
CLL
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (4 upvotes)
Papers citing
"RL's Razor: Why Online Reinforcement Learning Forgets Less"
18 / 18 papers shown
Title
How to Teach Large Multimodal Models New Skills
Zhen Zhu
Yiming Gong
Yao Xiao
Yaoyao Liu
Derek Hoiem
MLLM
CLL
KELM
16
0
0
09 Oct 2025
Deterministic algorithms for inhomogeneous Bernoulli trials: Shapley value of network devices
Jesse D Wei
Guo Wei
FAtt
36
0
0
08 Oct 2025
Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning
Jonas Hübotter
Leander Diaz-Bone
Ido Hakimi
Andreas Krause
Moritz Hardt
32
0
0
06 Oct 2025
On Predictability of Reinforcement Learning Dynamics for Large Language Models
Yuchen Cai
Ding Cao
Xin Xu
Zijun Yao
Yuqing Huang
Zhenyu Tan
Benyi Zhang
Guiquan Liu
Junfeng Fang
24
0
0
01 Oct 2025
One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy Gradient
Rui Ming
Haoyuan Wu
Shoubo Hu
Zhuolun He
Bei Yu
OffRL
LRM
2
0
0
30 Sep 2025
Nudging the Boundaries of LLM Reasoning
Justin Chih-Yao Chen
Becky Xiangyu Peng
Prafulla Kumar Choubey
Kung-Hsiang Huang
Jiaxin Zhang
Mohit Bansal
Chien-Sheng Wu
LRM
8
0
0
30 Sep 2025
Debunk the Myth of SFT Generalization
Xiaofeng Lin
Hejian Sang
Zhipeng Wang
Xuezhou Zhang
OffRL
LRM
5
0
0
30 Sep 2025
PIPer: On-Device Environment Setup via Online Reinforcement Learning
Alexander Kovrigin
Aleksandra V. Eliseeva
Konstantin Grotov
Egor Bogomolov
Yaroslav Zharov
OffRL
36
0
0
29 Sep 2025
When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training
Sanxing Chen
Xiaoyin Chen
Yukun Huang
Roy Xie
Bhuwan Dhingra
20
0
0
29 Sep 2025
How LLMs Learn to Reason: A Complex Network Perspective
Sihan Hu
X-D Cai
Yuan Huang
Zhiyuan Yao
Linfeng Zhang
Pan Zhang
Youjin Deng
Kun Chen
LRM
44
0
0
28 Sep 2025
Why Alignment Must Precede Distillation: A Minimal Working Explanation
Sungmin Cha
Kyunghyun Cho
20
0
0
28 Sep 2025
IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning
Aayush Mishra
Daniel Khashabi
Anqi Liu
72
0
0
26 Sep 2025
Variational Reasoning for Language Models
Xiangxin Zhou
Zichen Liu
Haonan Wang
Chao Du
Min Lin
Chongxuan Li
Liang Wang
Tianyu Pang
OffRL
LRM
20
0
0
26 Sep 2025
When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance
Nicolas Boizard
Hippolyte Gisserot-Boukhlef
Kevin El Haddad
C´eline Hudelot
Pierre Colombo
ReLM
LRM
17
0
0
26 Sep 2025
SFT Doesn't Always Hurt General Capabilities: Revisiting Domain-Specific Fine-Tuning in LLMs
J. Lin
Zhongruo Wang
Kun Qian
Tian Wang
Arvind Srinivasan
...
Weiqi Zhang
Sujay Sanghavi
C. L. P. Chen
Hyokun Yun
Lihong Li
CLL
50
0
0
25 Sep 2025
RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs
Kohsei Matsutani
Shota Takashiro
Gouki Minegishi
Takeshi Kojima
Yusuke Iwasawa
Yutaka Matsuo
OffRL
ReLM
LRM
94
1
0
25 Sep 2025
Reinforcement Learning on Pre-Training Data
Siheng Li
Kejiao Li
Zenan Xu
Guanhua Huang
Evander Yang
...
Jianchen Zhu
W. Lam
Wayyt Wang
Bo Zhou
Di Wang
OffRL
LRM
38
1
0
23 Sep 2025
Self-Adapting Language Models
Adam Zweiger
Jyothish Pari
Han Guo
Ekin Akyürek
Yoon Kim
Pulkit Agrawal
KELM
LRM
296
10
0
12 Jun 2025
1