ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2509.04259
  4. Cited By
RL's Razor: Why Online Reinforcement Learning Forgets Less

RL's Razor: Why Online Reinforcement Learning Forgets Less

4 September 2025
Idan Shenfeld
Jyothish Pari
Pulkit Agrawal
    CLL
ArXiv (abs)PDFHTMLHuggingFace (4 upvotes)

Papers citing "RL's Razor: Why Online Reinforcement Learning Forgets Less"

18 / 18 papers shown
Title
How to Teach Large Multimodal Models New Skills
How to Teach Large Multimodal Models New Skills
Zhen Zhu
Yiming Gong
Yao Xiao
Yaoyao Liu
Derek Hoiem
MLLMCLLKELM
16
0
0
09 Oct 2025
Deterministic algorithms for inhomogeneous Bernoulli trials: Shapley value of network devices
Deterministic algorithms for inhomogeneous Bernoulli trials: Shapley value of network devices
Jesse D Wei
Guo Wei
FAtt
36
0
0
08 Oct 2025
Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning
Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning
Jonas Hübotter
Leander Diaz-Bone
Ido Hakimi
Andreas Krause
Moritz Hardt
32
0
0
06 Oct 2025
On Predictability of Reinforcement Learning Dynamics for Large Language Models
On Predictability of Reinforcement Learning Dynamics for Large Language Models
Yuchen Cai
Ding Cao
Xin Xu
Zijun Yao
Yuqing Huang
Zhenyu Tan
Benyi Zhang
Guiquan Liu
Junfeng Fang
24
0
0
01 Oct 2025
One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy Gradient
One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy Gradient
Rui Ming
Haoyuan Wu
Shoubo Hu
Zhuolun He
Bei Yu
OffRLLRM
2
0
0
30 Sep 2025
Nudging the Boundaries of LLM Reasoning
Nudging the Boundaries of LLM Reasoning
Justin Chih-Yao Chen
Becky Xiangyu Peng
Prafulla Kumar Choubey
Kung-Hsiang Huang
Jiaxin Zhang
Mohit Bansal
Chien-Sheng Wu
LRM
8
0
0
30 Sep 2025
Debunk the Myth of SFT Generalization
Debunk the Myth of SFT Generalization
Xiaofeng Lin
Hejian Sang
Zhipeng Wang
Xuezhou Zhang
OffRLLRM
5
0
0
30 Sep 2025
PIPer: On-Device Environment Setup via Online Reinforcement Learning
PIPer: On-Device Environment Setup via Online Reinforcement Learning
Alexander Kovrigin
Aleksandra V. Eliseeva
Konstantin Grotov
Egor Bogomolov
Yaroslav Zharov
OffRL
36
0
0
29 Sep 2025
When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training
When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training
Sanxing Chen
Xiaoyin Chen
Yukun Huang
Roy Xie
Bhuwan Dhingra
20
0
0
29 Sep 2025
How LLMs Learn to Reason: A Complex Network Perspective
How LLMs Learn to Reason: A Complex Network Perspective
Sihan Hu
X-D Cai
Yuan Huang
Zhiyuan Yao
Linfeng Zhang
Pan Zhang
Youjin Deng
Kun Chen
LRM
44
0
0
28 Sep 2025
Why Alignment Must Precede Distillation: A Minimal Working Explanation
Why Alignment Must Precede Distillation: A Minimal Working Explanation
Sungmin Cha
Kyunghyun Cho
20
0
0
28 Sep 2025
IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning
IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning
Aayush Mishra
Daniel Khashabi
Anqi Liu
72
0
0
26 Sep 2025
Variational Reasoning for Language Models
Variational Reasoning for Language Models
Xiangxin Zhou
Zichen Liu
Haonan Wang
Chao Du
Min Lin
Chongxuan Li
Liang Wang
Tianyu Pang
OffRLLRM
20
0
0
26 Sep 2025
When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance
When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance
Nicolas Boizard
Hippolyte Gisserot-Boukhlef
Kevin El Haddad
C´eline Hudelot
Pierre Colombo
ReLMLRM
17
0
0
26 Sep 2025
SFT Doesn't Always Hurt General Capabilities: Revisiting Domain-Specific Fine-Tuning in LLMs
SFT Doesn't Always Hurt General Capabilities: Revisiting Domain-Specific Fine-Tuning in LLMs
J. Lin
Zhongruo Wang
Kun Qian
Tian Wang
Arvind Srinivasan
...
Weiqi Zhang
Sujay Sanghavi
C. L. P. Chen
Hyokun Yun
Lihong Li
CLL
50
0
0
25 Sep 2025
RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs
RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs
Kohsei Matsutani
Shota Takashiro
Gouki Minegishi
Takeshi Kojima
Yusuke Iwasawa
Yutaka Matsuo
OffRLReLMLRM
94
1
0
25 Sep 2025
Reinforcement Learning on Pre-Training Data
Reinforcement Learning on Pre-Training Data
Siheng Li
Kejiao Li
Zenan Xu
Guanhua Huang
Evander Yang
...
Jianchen Zhu
W. Lam
Wayyt Wang
Bo Zhou
Di Wang
OffRLLRM
38
1
0
23 Sep 2025
Self-Adapting Language Models
Self-Adapting Language Models
Adam Zweiger
Jyothish Pari
Han Guo
Ekin Akyürek
Yoon Kim
Pulkit Agrawal
KELMLRM
296
10
0
12 Jun 2025
1