Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2508.05629
Cited By
v1
v2 (latest)
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
7 August 2025
Yongliang Wu
Y. Zhou
Zhou Ziheng
Yingzhe Peng
Xinyu Ye
Xinting Hu
Wenbo Zhu
Lu Qi
Ming-Hsuan Yang
Xu Yang
OffRL
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (134 upvotes)
Github (467★)
Papers citing
"On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification"
20 / 20 papers shown
Title
Metis-HOME: Hybrid Optimized Mixture-of-Experts for Multimodal Reasoning
Xiaohan Lan
Fanfan Liu
Haibo Qiu
Siqi Yang
Delian Ruan
Peng Shi
Lin Ma
MoE
LRM
36
0
0
23 Oct 2025
MENTOR: A Reinforcement Learning Framework for Enabling Tool Use in Small Models via Teacher-Optimized Rewards
Changsu Choi
Hoyun Song
Dongyeon Kim
WooHyeon Jung
Minkyung Cho
Sunjin Park
NohHyeob Bae
Seona Yu
Kyungtae Lim
52
0
0
21 Oct 2025
MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models
Yongshun Zhang
Zhongyi Fan
Yonghang Zhang
Zhangzikang Li
Weifeng Chen
Zhongwei Feng
Chaoyue Wang
Peng Hou
Anxiang Zeng
VGen
104
0
0
20 Oct 2025
Holdout-Loss-Based Data Selection for LLM Finetuning via In-Context Learning
Ling Zhang
Xianliang Yang
Juwon Yu
Park Cheonyoung
Lei Song
Jiang Bian
8
0
0
16 Oct 2025
Can GRPO Help LLMs Transcend Their Pretraining Origin?
Kangqi Ni
Zhen Tan
Zijie Liu
Pingzhi Li
Tianlong Chen
OffRL
LRM
28
0
0
14 Oct 2025
Enhancing Large Language Model Reasoning via Selective Critical Token Fine-Tuning
Zhiwen Ruan
Yixia Li
He Zhu
Yun Chen
P. Li
Yang Liu
Guanhua Chen
LRM
34
0
0
13 Oct 2025
Mitigating Forgetting Between Supervised and Reinforcement Learning Yields Stronger Reasoners
Xiangchi Yuan
Xiang Chen
Tong Yu
Dachuan Shi
Can Jin
Wenke Lee
Saayan Mitra
CLL
OffRL
ReLM
LRM
38
0
0
06 Oct 2025
Executable Counterfactuals: Improving LLMs' Causal Reasoning Through Code
Aniket Vashishtha
Qirun Dai
Hongyuan Mei
Amit Sharma
Chenhao Tan
Hao Peng
LRM
79
0
0
02 Oct 2025
On Predictability of Reinforcement Learning Dynamics for Large Language Models
Yuchen Cai
Ding Cao
Xin Xu
Zijun Yao
Yuqing Huang
Zhenyu Tan
Benyi Zhang
Guiquan Liu
Junfeng Fang
56
0
0
01 Oct 2025
Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum
Gaotang Li
Ruizhong Qiu
Xiusi Chen
Heng Ji
Hanghang Tong
32
0
0
01 Oct 2025
One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy Gradient
Rui Ming
Haoyuan Wu
Shoubo Hu
Zhuolun He
Bei Yu
OffRL
LRM
36
0
0
30 Sep 2025
Debunk the Myth of SFT Generalization
Xiaofeng Lin
Hejian Sang
Zhipeng Wang
Xuezhou Zhang
OffRL
LRM
49
0
0
30 Sep 2025
PIPer: On-Device Environment Setup via Online Reinforcement Learning
Alexander Kovrigin
Aleksandra V. Eliseeva
Konstantin Grotov
Egor Bogomolov
Yaroslav Zharov
OffRL
55
0
0
29 Sep 2025
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
Chaorui Yao
Yanxi Chen
Yuchang Sun
Yushuo Chen
Wenhao Zhang
Xuchen Pan
Yaliang Li
Bolin Ding
OffRL
24
2
0
29 Sep 2025
Anchored Supervised Fine-Tuning
He Zhu
Junyou Su
Peng Lai
Ren Ma
Wenjia Zhang
L. Yang
Guanhua Chen
OffRL
28
0
0
28 Sep 2025
Variational Reasoning for Language Models
Xiangxin Zhou
Zichen Liu
Haonan Wang
Chao Du
Min Lin
Chongxuan Li
Liang Wang
Tianyu Pang
OffRL
LRM
70
0
0
26 Sep 2025
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
Yulei Qin
Xiaoyu Tan
Zhengbao He
Gang Li
Haojia Lin
...
Yuzheng Cai
Xuan Zhang
Sheng Ye
Ke Li
Xing Sun
101
0
0
26 Sep 2025
WeFT: Weighted Entropy-driven Fine-Tuning for dLLMs
Guowei Xu
Wenxin Xu
Jiawang Zhao
Kaisheng Ma
DiffM
32
0
0
25 Sep 2025
Proximal Supervised Fine-Tuning
Wenhong Zhu
Ruobing Xie
R. Wang
Xingwu Sun
Di Wang
Pengfei Liu
OffRL
52
2
0
25 Aug 2025
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
Wenhao Zhang
Yuexiang Xie
Yuchang Sun
Yanxi Chen
Guoyin Wang
Yaliang Li
Bolin Ding
Jingren Zhou
OffRL
56
20
0
15 Aug 2025
1