Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.17161
Cited By
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
28 January 2025
Tianzhe Chu
Yuexiang Zhai
Jihan Yang
Shengbang Tong
Saining Xie
Dale Schuurmans
Quoc V. Le
Sergey Levine
Yi-An Ma
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training"
40 / 40 papers shown
Title
TWIST: Teleoperated Whole-Body Imitation System
Yanjie Ze
Zixuan Chen
Joao Pedro Araujo
Zi-ang Cao
Xue Bin Peng
Jiajun Wu
Chao Liu
30
0
0
05 May 2025
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
Yi-Fan Zhang
Xingyu Lu
X. Hu
Chaoyou Fu
Bin Wen
...
J. Chen
Fan Yang
Z. Zhang
Tingting Gao
Liang Wang
OffRL
LRM
34
0
0
05 May 2025
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
C. L. P. Chen
J. Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRL
LRM
53
0
0
30 Apr 2025
Reason Like a Radiologist: Chain-of-Thought and Reinforcement Learning for Verifiable Report Generation
Peiyuan Jing
Kinhei Lee
Zhenxuan Zhang
Huichi Zhou
Zhengqing Yuan
Zhifan Gao
Lei Zhu
G. Papanastasiou
Yingying Fang
Guang Yang
MedIm
OffRL
LRM
58
0
0
25 Apr 2025
TTRL: Test-Time Reinforcement Learning
Yuxin Zuo
Kaiyan Zhang
Shang Qu
Li Sheng
Xuekai Zhu
Biqing Qi
Youbang Sun
Ganqu Cui
Ning Ding
Bowen Zhou
OffRL
35
1
0
22 Apr 2025
Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning
Wang Lin
Liyu Jia
Wentao Hu
Kaihang Pan
Zhongqi Yue
Wei Zhao
Jingyuan Chen
Fei Wu
Hanwang Zhang
VGen
44
0
0
22 Apr 2025
Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs
Chun-Hsiao Yeh
Chenyu Wang
Shengbang Tong
Ta-Ying Cheng
Rouyu Wang
Tianzhe Chu
Yuexiang Zhai
Yubei Chen
Shenghua Gao
Yi Ma
LRM
61
0
0
21 Apr 2025
SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs
Minh V.T. Pham
Huy N. Phan
Hoang N. Phan
Cuong Le Chi
T. Nguyen
Nghi D. Q. Bui
SyDa
24
0
0
20 Apr 2025
Relation-R1: Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relational Comprehension
Lin Li
Wei Chen
Jiahui Li
L. Chen
LRM
33
1
0
20 Apr 2025
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
Siyan Zhao
Devaansh Gupta
Qinqing Zheng
Aditya Grover
DiffM
LRM
AI4CE
42
0
0
16 Apr 2025
Slow Thinking for Sequential Recommendation
Junjie Zhang
Beichen Zhang
Wenqi Sun
Hongyu Lu
Wayne Xin Zhao
Yu Chen
Ji-Rong Wen
OffRL
LRM
28
0
0
13 Apr 2025
Playpen: An Environment for Exploring Learning Through Conversational Interaction
Nicola Horst
Davide Mazzaccara
Antonia Schmidt
Michael Sullivan
Filippo Momentè
...
Alexander Koller
Oliver Lemon
David Schlangen
Mario Giulianelli
Alessandro Suglia
OffRL
32
0
0
11 Apr 2025
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Hardy Chen
Haoqin Tu
Fali Wang
Hui Liu
X. Tang
Xinya Du
Yuyin Zhou
Cihang Xie
ReLM
VLM
OffRL
LRM
57
6
0
10 Apr 2025
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model
Haozhan Shen
Peng Liu
J. Li
Chunxin Fang
Yibo Ma
...
Zilun Zhang
Kangjia Zhao
Qianqian Zhang
Ruochen Xu
Tiancheng Zhao
VLM
LRM
71
0
0
10 Apr 2025
Perception-R1: Pioneering Perception Policy with Reinforcement Learning
En Yu
Kangheng Lin
Liang Zhao
Jisheng Yin
Yana Wei
...
Zheng Ge
Xiangyu Zhang
Daxin Jiang
Jingyu Wang
Wenbing Tao
VLM
OffRL
LRM
35
0
0
10 Apr 2025
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Rosie Zhao
Alexandru Meterez
Sham Kakade
C. Pehlevan
Samy Jelassi
Eran Malach
ReLM
LRM
38
2
0
10 Apr 2025
Do Reasoning Models Show Better Verbalized Calibration?
Qingcheng Zeng
Weihao Xuan
Leyang Cui
Rob Voigt
LRM
28
0
0
09 Apr 2025
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
Yan Ma
Steffi Chern
Xuyang Shen
Yiran Zhong
Pengfei Liu
OffRL
LRM
43
1
0
03 Apr 2025
A Survey of Scaling in Large Language Model Reasoning
Zihan Chen
Song Wang
Zhen Tan
Xingbo Fu
Zhenyu Lei
Peng Wang
Huan Liu
Cong Shen
Jundong Li
LRM
84
0
0
02 Apr 2025
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL
Mohammadreza Pourreza
Shayan Talaei
Ruoxi Sun
Xingchen Wan
Hailong Li
Azalia Mirhoseini
Amin Saberi
Sercan Ö. Arik
ReLM
AI4TS
LRM
42
1
0
29 Mar 2025
Video-R1: Reinforcing Video Reasoning in MLLMs
Kaituo Feng
Kaixiong Gong
B. Li
Zonghao Guo
Yibing Wang
Tianshuo Peng
J. Wu
Xiaoying Zhang
Benyou Wang
Xiangyu Yue
AI4TS
SyDa
LRM
46
13
0
27 Mar 2025
Reasoning Beyond Limits: Advances and Open Problems for LLMs
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELM
OffRL
LRM
AI4CE
59
2
0
26 Mar 2025
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning
Huajie Tan
Yuheng Ji
Xiaoshuai Hao
Minglan Lin
Pengwei Wang
Zhongyuan Wang
Shanghang Zhang
ReLM
OffRL
LRM
90
0
0
26 Mar 2025
Test-Time Reasoning Through Visual Human Preferences with VLMs and Soft Rewards
Alexander Gambashidze
Konstantin Sobolev
Andrey Kuznetsov
Ivan V. Oseledets
VLM
LRM
44
0
0
25 Mar 2025
OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning
Zhiyuan Liu
Yuting Zhang
Feng Liu
Changwang Zhang
Ying Sun
Jun Wang
LRM
68
2
0
20 Mar 2025
Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models
Yuxiang Lai
Jike Zhong
Ming Li
Shitian Zhao
Xiaofeng Yang
OffRL
LRM
LM&MA
VLM
68
5
0
18 Mar 2025
Aligning Multimodal LLM with Human Preference: A Survey
Tao Yu
Y. Zhang
Chaoyou Fu
Junkang Wu
Jinda Lu
...
Qingsong Wen
Z. Zhang
Yan Huang
Liang Wang
T. Tan
73
2
0
18 Mar 2025
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
Cheng Deng
Luoyang Sun
Jiwen Jiang
Yongcheng Zeng
Xinjian Wu
...
Haoyang Li
Lei Chen
Lionel M. Ni
H. Zhang
Jun Wang
66
0
0
15 Mar 2025
Thinking Machines: A Survey of LLM based Reasoning Strategies
Dibyanayan Bandyopadhyay
Soham Bhattacharjee
Asif Ekbal
LRM
ELM
46
4
0
13 Mar 2025
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning
Ziyu Wan
Yunxiang Li
Y. Song
Hanjing Wang
Linyi Yang
Mark W. Schmidt
J. Wang
Weinan Zhang
Shuyue Hu
Ying Wen
LLMAG
KELM
LRM
AI4CE
81
5
0
12 Mar 2025
Efficient Algorithms for Verifying Kruskal Rank in Sparse Linear Regression and Related Applications
Fengqin Zhou
43
3
0
06 Mar 2025
High-Precision Transformer-Based Visual Servoing for Humanoid Robots in Aligning Tiny Objects
Jialong Xue
Wei Gao
Yu Wang
Chao Ji
Dongdong Zhao
Shi Yan
Shiwu Zhang
40
0
0
06 Mar 2025
All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning
Gokul Swamy
Sanjiban Choudhury
Wen Sun
Zhiwei Steven Wu
J. Andrew Bagnell
OffRL
42
7
0
03 Mar 2025
Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning
Sheng Zhang
Qianchu Liu
Guanghui Qin
Tristan Naumann
Hoifung Poon
ReLM
OffRL
LRM
73
2
0
27 Feb 2025
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
Jiazhen Pan
Che Liu
Junde Wu
Fenglin Liu
Jiayuan Zhu
Hongwei Bran Li
Chen Chen
C. Ouyang
Daniel Rueckert
LRM
LM&MA
VLM
65
10
0
26 Feb 2025
Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement
Siyuan Zhang
Y. Zhang
Yinpeng Dong
Hang Su
HILM
KELM
84
0
0
26 Feb 2025
DocPuzzle: A Process-Aware Benchmark for Evaluating Realistic Long-Context Reasoning Capabilities
Tianyi Zhuang
Chuqiao Kuang
Xiaoguang Li
Yihua Teng
Jihao Wu
Y. Wang
Lifeng Shang
RALM
ELM
LRM
65
0
0
25 Feb 2025
IPO: Your Language Model is Secretly a Preference Classifier
Shivank Garg
Ayush Singh
Shweta Singh
Paras Chopra
47
1
0
22 Feb 2025
Forgotten Polygons: Multimodal Large Language Models are Shape-Blind
William Rudman
Michal Golovanesky
Amir Bar
Vedant Palit
Yann LeCun
Carsten Eickhoff
Ritambhara Singh
LRM
47
2
0
21 Feb 2025
Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving
Xin Xu
Yan Xu
Tianhao Chen
Yuchen Yan
Chengwu Liu
...
Y. Wang
Yichun Yin
Y. Wang
Lifeng Shang
Q. Liu
LRM
59
2
0
17 Feb 2025
1