ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.21668
  4. Cited By
R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning
v1v2 (latest)

R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning

27 May 2025
Yongchao Chen
Y. Liu
Junwei Zhou
Yilun Hao
Jingquan Wang
Yang Zhang
Chuchu Fan
Chuchu Fan
    OffRLReLMAI4TSSyDaALMLRM
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)Github (23★)

Papers citing "R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning"

14 / 14 papers shown
A Survey on Agentic Multimodal Large Language Models
A Survey on Agentic Multimodal Large Language Models
Huanjin Yao
Ruifei Zhang
Jiaxing Huang
Jingyi Zhang
Yibo Wang
...
Ruolin Zhu
Yongcheng Jing
Shunyu Liu
Guanbin Li
Dacheng Tao
LM&RoAIFinAI4TSLRMAI4CE
246
4
0
13 Oct 2025
How Many Code and Test Cases Are Enough? Evaluating Test Cases Generation from a Binary-Matrix Perspective
How Many Code and Test Cases Are Enough? Evaluating Test Cases Generation from a Binary-Matrix Perspective
Xianzhen Luo
Jinyang Huang
Wenzhen Zheng
Qingfu Zhu
Mingzheng Xu
Yiheng Xu
YuanTao Fan
L. Qin
Wanxiang Che
96
2
0
09 Oct 2025
Learning to Reason for Hallucination Span Detection
Learning to Reason for Hallucination Span Detection
Hsuan Su
Ting-Yao Hu
H. Koppula
Kundan Krishna
Hadi Pouransari
Cheng-Yu Hsieh
Cem Koc
Joseph Y Cheng
Oncel Tuzel
Raviteja Vemulapalli
ReLMOffRLHILMLRM
249
2
0
02 Oct 2025
Learning How to Use Tools, Not Just When: Pattern-Aware Tool-Integrated Reasoning
Learning How to Use Tools, Not Just When: Pattern-Aware Tool-Integrated Reasoning
Ningning Xu
Yuxuan Jiang
Shubhashis Roy Dipta
Hengyuan Zhang
LRM
133
1
0
27 Sep 2025
Learning to Reason in Structured In-context Environments with Reinforcement Learning
Learning to Reason in Structured In-context Environments with Reinforcement Learning
Peng Yu
Zeyuan Zhao
Shao Zhang
Luoyi Fu
Xinbing Wang
Ying Wen
OffRLLRM
177
0
0
27 Sep 2025
WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning
WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning
Zimu Lu
Houxing Ren
Yunqiao Yang
Ke Wang
Zhuofan Zong
Junting Pan
Mingjie Zhan
Jiaming Song
LLMAG
129
0
0
26 Sep 2025
NIRVANA: Structured pruning reimagined for large language models compression
NIRVANA: Structured pruning reimagined for large language models compression
Mengting Ai
Tianxin Wei
Sirui Chen
Jingrui He
VLM
1.6K
1
0
17 Sep 2025
ToolRL: Reward is All Tool Learning Needs
ToolRL: Reward is All Tool Learning Needs
Cheng Qian
Emre Can Acikgoz
Qi He
Hongru Wang
Xiusi Chen
Dilek Hakkani-Tur
Gokhan Tur
Heng Ji
OffRLLRM
536
146
0
16 Apr 2025
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Haozhe Wang
Chao Qu
Zuming Huang
Wei Chu
Fangzhen Lin
Lei Ma
OffRLReLMSyDaLRMVLM
486
171
0
10 Apr 2025
ToRL: Scaling Tool-Integrated RL
ToRL: Scaling Tool-Integrated RL
Xuefeng Li
Haoyang Zou
Pengfei Liu
OffRLLRM
412
76
0
30 Mar 2025
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Jingyi Zhang
Jiaxing Huang
Huanjin Yao
Shunyu Liu
Xikun Zhang
Shijian Lu
Dacheng Tao
LRM
385
200
0
17 Mar 2025
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Sara Szymkuć
Hansi Zeng
Zhenrui Yue
Jinsung Yoon
Sercan O. Arik
Dong Wang
Hamed Zamani
Jiawei Han
OffRLAI4TSLRMRALMReLMKELM
807
560
0
12 Mar 2025
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu
Yuexiang Zhai
Jihan Yang
Shengbang Tong
Saining Xie
Dale Schuurmans
Quoc V. Le
Sergey Levine
Yi-An Ma
OffRL
675
404
0
28 Jan 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
OffRLAI4TSLRMReLMVLM
1.2K
5,342
0
22 Jan 2025
1