Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2505.21668
Cited By
v1
v2 (latest)
R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning
27 May 2025
Yongchao Chen
Y. Liu
Junwei Zhou
Yilun Hao
Jingquan Wang
Yang Zhang
Chuchu Fan
Chuchu Fan
OffRL
ReLM
AI4TS
SyDa
ALM
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Github (23★)
Papers citing
"R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning"
14 / 14 papers shown
A Survey on Agentic Multimodal Large Language Models
Huanjin Yao
Ruifei Zhang
Jiaxing Huang
Jingyi Zhang
Yibo Wang
...
Ruolin Zhu
Yongcheng Jing
Shunyu Liu
Guanbin Li
Dacheng Tao
LM&Ro
AIFin
AI4TS
LRM
AI4CE
246
4
0
13 Oct 2025
How Many Code and Test Cases Are Enough? Evaluating Test Cases Generation from a Binary-Matrix Perspective
Xianzhen Luo
Jinyang Huang
Wenzhen Zheng
Qingfu Zhu
Mingzheng Xu
Yiheng Xu
YuanTao Fan
L. Qin
Wanxiang Che
96
2
0
09 Oct 2025
Learning to Reason for Hallucination Span Detection
Hsuan Su
Ting-Yao Hu
H. Koppula
Kundan Krishna
Hadi Pouransari
Cheng-Yu Hsieh
Cem Koc
Joseph Y Cheng
Oncel Tuzel
Raviteja Vemulapalli
ReLM
OffRL
HILM
LRM
249
2
0
02 Oct 2025
Learning How to Use Tools, Not Just When: Pattern-Aware Tool-Integrated Reasoning
Ningning Xu
Yuxuan Jiang
Shubhashis Roy Dipta
Hengyuan Zhang
LRM
133
1
0
27 Sep 2025
Learning to Reason in Structured In-context Environments with Reinforcement Learning
Peng Yu
Zeyuan Zhao
Shao Zhang
Luoyi Fu
Xinbing Wang
Ying Wen
OffRL
LRM
177
0
0
27 Sep 2025
WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning
Zimu Lu
Houxing Ren
Yunqiao Yang
Ke Wang
Zhuofan Zong
Junting Pan
Mingjie Zhan
Jiaming Song
LLMAG
129
0
0
26 Sep 2025
NIRVANA: Structured pruning reimagined for large language models compression
Mengting Ai
Tianxin Wei
Sirui Chen
Jingrui He
VLM
1.6K
1
0
17 Sep 2025
ToolRL: Reward is All Tool Learning Needs
Cheng Qian
Emre Can Acikgoz
Qi He
Hongru Wang
Xiusi Chen
Dilek Hakkani-Tur
Gokhan Tur
Heng Ji
OffRL
LRM
536
146
0
16 Apr 2025
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Haozhe Wang
Chao Qu
Zuming Huang
Wei Chu
Fangzhen Lin
Lei Ma
OffRL
ReLM
SyDa
LRM
VLM
486
171
0
10 Apr 2025
ToRL: Scaling Tool-Integrated RL
Xuefeng Li
Haoyang Zou
Pengfei Liu
OffRL
LRM
412
76
0
30 Mar 2025
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Jingyi Zhang
Jiaxing Huang
Huanjin Yao
Shunyu Liu
Xikun Zhang
Shijian Lu
Dacheng Tao
LRM
385
200
0
17 Mar 2025
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Sara Szymkuć
Hansi Zeng
Zhenrui Yue
Jinsung Yoon
Sercan O. Arik
Dong Wang
Hamed Zamani
Jiawei Han
OffRL
AI4TS
LRM
RALM
ReLM
KELM
807
560
0
12 Mar 2025
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu
Yuexiang Zhai
Jihan Yang
Shengbang Tong
Saining Xie
Dale Schuurmans
Quoc V. Le
Sergey Levine
Yi-An Ma
OffRL
675
404
0
28 Jan 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
OffRL
AI4TS
LRM
ReLM
VLM
1.2K
5,342
0
22 Jan 2025
1