ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.18892
  4. Cited By
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

24 March 2025
Weihao Zeng
Yuzhen Huang
Qian Liu
Wei Liu
Keqing He
Zejun Ma
Junxian He
    OffRL
    ReLM
    LRM
ArXivPDFHTML

Papers citing "SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild"

21 / 21 papers shown
Title
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
L. Liu
...
Jianfeng Gao
Weizhu Chen
S. Wang
Simon S. Du
Yelong Shen
OffRL
ReLM
LRM
100
1
0
29 Apr 2025
TTRL: Test-Time Reinforcement Learning
TTRL: Test-Time Reinforcement Learning
Yuxin Zuo
Kaiyan Zhang
Shang Qu
Li Sheng
Xuekai Zhu
Biqing Qi
Youbang Sun
Ganqu Cui
Ning Ding
Bowen Zhou
OffRL
17
1
0
22 Apr 2025
Tina: Tiny Reasoning Models via LoRA
Tina: Tiny Reasoning Models via LoRA
Shangshang Wang
Julian Asilis
Ömer Faruk Akgül
Enes Burak Bilgin
Ollie Liu
W. Neiswanger
OffRL
LRM
15
0
0
22 Apr 2025
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Jie Cheng
Ruixi Qiao
Lijun Li
Chao Guo
J. Z. Wang
Gang Xiong
Yisheng Lv
Fei-Yue Wang
LRM
27
0
0
21 Apr 2025
OTC: Optimal Tool Calls via Reinforcement Learning
OTC: Optimal Tool Calls via Reinforcement Learning
Hongru Wang
Cheng Qian
Wanjun Zhong
X. Chen
Jiahao Qiu
Shijue Huang
Bowen Jin
Mengdi Wang
Kam-Fai Wong
Heng Ji
OffRL
LRM
19
0
0
21 Apr 2025
Learning to Reason under Off-Policy Guidance
Learning to Reason under Off-Policy Guidance
Jianhao Yan
Yafu Li
Zican Hu
Zhi Wang
Ganqu Cui
Xiaoye Qu
Yu Cheng
Yue Zhang
OffRL
LRM
22
0
0
21 Apr 2025
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Yang Yue
Zhiqi Chen
Rui Lu
Andrew Zhao
Zhaokai Wang
Yang Yue
Shiji Song
Gao Huang
ReLM
LRM
25
1
0
18 Apr 2025
Open-Medical-R1: How to Choose Data for RLVR Training at Medicine Domain
Open-Medical-R1: How to Choose Data for RLVR Training at Medicine Domain
Zhongxi Qiu
Zhang Zhang
Yan Hu
Heng Li
Jiang-Dong Liu
OffRL
20
0
0
16 Apr 2025
Efficient Reasoning Models: A Survey
Efficient Reasoning Models: A Survey
Sicheng Feng
Gongfan Fang
Xinyin Ma
Xinchao Wang
ReLM
LRM
29
0
0
15 Apr 2025
Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning
Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning
Syeda Nahida Akter
Shrimai Prabhumoye
Matvei Novikov
Seungju Han
Ying Lin
...
Eric Nyberg
Yejin Choi
M. Patwary
M. Shoeybi
Bryan Catanzaro
ReLM
OffRL
LRM
37
0
1
15 Apr 2025
Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time
Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time
Wang Yang
Xiang Yue
V. Chaudhary
Xiaotian Han
ReLM
LRM
39
1
0
12 Apr 2025
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Haozhe Wang
C. Qu
Zuming Huang
Wei Chu
Fangzhen Lin
Wenhu Chen
OffRL
ReLM
SyDa
LRM
VLM
51
1
0
10 Apr 2025
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Hardy Chen
Haoqin Tu
Fali Wang
Hui Liu
X. Tang
Xinya Du
Yuyin Zhou
Cihang Xie
ReLM
VLM
OffRL
LRM
37
5
0
10 Apr 2025
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
Andreas Hochlehnert
Hardik Bhatnagar
Vishaal Udandarao
Samuel Albanie
Ameya Prabhu
Matthias Bethge
ReLM
ALM
LRM
43
2
0
09 Apr 2025
Leanabell-Prover: Posttraining Scaling in Formal Reasoning
Leanabell-Prover: Posttraining Scaling in Formal Reasoning
Jingyuan Zhang
Qi Wang
Xingguang Ji
Y. Liu
Yang Yue
Fuzheng Zhang
Di Zhang
Guorui Zhou
Kun Gai
LRM
16
1
0
08 Apr 2025
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Taiwei Shi
Yiyang Wu
Linxin Song
Tianyi Zhou
Jieyu Zhao
LRM
59
1
0
07 Apr 2025
Concise Reasoning via Reinforcement Learning
Concise Reasoning via Reinforcement Learning
Mehdi Fatemi
Banafsheh Rafiee
Mingjie Tang
Kartik Talamadupula
ReLM
OffRL
LRM
25
3
0
07 Apr 2025
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
Yan Ma
Steffi Chern
Xuyang Shen
Yiran Zhong
Pengfei Liu
OffRL
LRM
36
1
0
03 Apr 2025
ThinkPrune: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning
ThinkPrune: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning
Bairu Hou
Yang Zhang
Jiabao Ji
Yujian Liu
Kaizhi Qian
Jacob Andreas
Shiyu Chang
OffRL
LRM
43
3
0
02 Apr 2025
How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study
How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study
Yunjie Ji
Sitong Zhao
Xiaoyu Tian
Haotian Wang
Shuaiting Chen
Yiping Peng
Han Zhao
Xiangang Li
LRM
29
1
0
01 Apr 2025
FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models
FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models
Mingyang Song
Mao Zheng
Zheng Li
Wenjie Yang
Xuan Luo
Yue Pan
Feng Zhang
ReLM
LRM
48
2
0
21 Mar 2025
1