ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.20783
  4. Cited By
Understanding R1-Zero-Like Training: A Critical Perspective

Understanding R1-Zero-Like Training: A Critical Perspective

26 March 2025
Zichen Liu
Changyu Chen
Wenjun Li
Penghui Qi
Tianyu Pang
Chao Du
Wee Sun Lee
Min Lin
    OffRL
    LRM
ArXivPDFHTML

Papers citing "Understanding R1-Zero-Like Training: A Critical Perspective"

33 / 33 papers shown
Title
CEC-Zero: Chinese Error Correction Solution Based on LLM
CEC-Zero: Chinese Error Correction Solution Based on LLM
Sophie Zhang
Zhiming Lin
9
0
0
14 May 2025
DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models
DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models
Xiwen Chen
Wenhui Zhu
Peijie Qiu
Xuanzhao Dong
Hao Wang
Haiyu Wu
Huayu Li
Aristeidis Sotiras
Y. Wang
Abolfazl Razi
ALM
30
0
0
14 May 2025
Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation
Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation
Enci Zhang
Xingang Yan
Wei Lin
Tianxiang Zhang
Qianchun Lu
LRM
16
0
0
13 May 2025
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains
Qianchu Liu
Sheng Zhang
Guanghui Qin
Timothy Ossowski
Yu Gu
...
Sam Preston
Mu-Hsin Wei
Paul Vozila
Tristan Naumann
Hoifung Poon
OOD
LRM
VLM
47
0
0
06 May 2025
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
Yi-Fan Zhang
Xingyu Lu
X. Hu
Chaoyou Fu
Bin Wen
...
J. Chen
Fan Yang
Z. Zhang
Tingting Gao
Liang Wang
OffRL
LRM
41
0
0
05 May 2025
Exploring the Potential of Offline RL for Reasoning in LLMs: A Preliminary Study
Exploring the Potential of Offline RL for Reasoning in LLMs: A Preliminary Study
Xiaoyu Tian
Sitong Zhao
Haotian Wang
Shuaiting Chen
Yiping Peng
Yunjie Ji
Han Zhao
Xiangang Li
OffRL
LRM
29
0
0
04 May 2025
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
C. L. P. Chen
J. Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRL
LRM
53
0
0
30 Apr 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
L. Liu
...
Jianfeng Gao
Weizhu Chen
S. Wang
Simon S. Du
Yelong Shen
OffRL
ReLM
LRM
118
2
0
29 Apr 2025
Pushing the boundary on Natural Language Inference
Pushing the boundary on Natural Language Inference
Pablo Miralles-González
Javier Huertas-Tato
Alejandro Martín
David Camacho
LRM
44
0
0
25 Apr 2025
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
Z. Wang
K. Wang
Q. Wang
Pingyue Zhang
Linjie Li
...
Jiajun Wu
L. Fei-Fei
Lijuan Wang
Yejin Choi
Manling Li
84
1
0
24 Apr 2025
TTRL: Test-Time Reinforcement Learning
TTRL: Test-Time Reinforcement Learning
Yuxin Zuo
Kaiyan Zhang
Shang Qu
Li Sheng
Xuekai Zhu
Biqing Qi
Youbang Sun
Ganqu Cui
Ning Ding
Bowen Zhou
OffRL
100
1
0
22 Apr 2025
Tina: Tiny Reasoning Models via LoRA
Tina: Tiny Reasoning Models via LoRA
Shangshang Wang
Julian Asilis
Ömer Faruk Akgül
Enes Burak Bilgin
Ollie Liu
W. Neiswanger
OffRL
LRM
29
1
0
22 Apr 2025
Learning to Reason under Off-Policy Guidance
Learning to Reason under Off-Policy Guidance
Jianhao Yan
Yafu Li
Zican Hu
Zhi Wang
Ganqu Cui
Xiaoye Qu
Yu Cheng
Yue Zhang
OffRL
LRM
39
0
0
21 Apr 2025
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Jie Cheng
Ruixi Qiao
Lijun Li
Chao Guo
J. Z. Wang
Gang Xiong
Yisheng Lv
Fei-Yue Wang
LRM
107
0
0
21 Apr 2025
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Yang Yue
Zhiqi Chen
Rui Lu
Andrew Zhao
Zhaokai Wang
Yang Yue
Shiji Song
Gao Huang
ReLM
LRM
42
11
0
18 Apr 2025
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
Xiangyan Liu
Jinjie Ni
Zijian Wu
Chao Du
Longxu Dou
H. Wang
Tianyu Pang
Michael Shieh
OffRL
LRM
100
0
0
17 Apr 2025
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
Siyan Zhao
Devaansh Gupta
Qinqing Zheng
Aditya Grover
DiffM
LRM
AI4CE
42
1
0
16 Apr 2025
FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos
FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos
Rui Chen
Lei Sun
Jing Tang
Geng Li
Xiangxiang Chu
LRM
24
0
0
14 Apr 2025
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
Xingjian Zhang
Siwei Wen
Wenjun Wu
Lei Huang
LRM
32
1
0
13 Apr 2025
GRPO-LEAD: A Difficulty-Aware Reinforcement Learning Approach for Concise Mathematical Reasoning in Language Models
GRPO-LEAD: A Difficulty-Aware Reinforcement Learning Approach for Concise Mathematical Reasoning in Language Models
Jixiao Zhang
Chunsheng Zuo
LRM
32
2
0
13 Apr 2025
Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time
Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time
Wang Yang
Xiang Yue
V. Chaudhary
Xiaotian Han
ReLM
LRM
65
1
0
12 Apr 2025
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Rosie Zhao
Alexandru Meterez
Sham Kakade
C. Pehlevan
Samy Jelassi
Eran Malach
ReLM
LRM
76
2
0
10 Apr 2025
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model
Haozhan Shen
Peng Liu
J. Li
Chunxin Fang
Yibo Ma
...
Zilun Zhang
Kangjia Zhao
Qianqian Zhang
Ruochen Xu
Tiancheng Zhao
VLM
LRM
74
25
0
10 Apr 2025
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Haozhe Wang
C. Qu
Zuming Huang
Wei Chu
Fangzhen Lin
Wenhu Chen
OffRL
ReLM
SyDa
LRM
VLM
72
1
0
10 Apr 2025
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
Andreas Hochlehnert
Hardik Bhatnagar
Vishaal Udandarao
Samuel Albanie
Ameya Prabhu
Matthias Bethge
ReLM
ALM
LRM
89
4
0
09 Apr 2025
Leanabell-Prover: Posttraining Scaling in Formal Reasoning
Leanabell-Prover: Posttraining Scaling in Formal Reasoning
Jingyuan Zhang
Qi Wang
Xingguang Ji
Y. Liu
Yang Yue
Fuzheng Zhang
Di Zhang
Guorui Zhou
Kun Gai
LRM
34
2
0
08 Apr 2025
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Yu Yue
Yufeng Yuan
Qiying Yu
Xiaochen Zuo
Ruofei Zhu
...
Ru Zhang
Xin Liu
Mingxuan Wang
Yonghui Wu
Lin Yan
OffRL
LRM
29
5
0
07 Apr 2025
Rethinking Reflection in Pre-Training
Rethinking Reflection in Pre-Training
Essential AI
Darsh J Shah
Peter Rushton
Somanshu Singla
Mohit Parmar
...
Philip Monk
Platon Mazarakis
Ritvik Kapila
Saurabh Srivastava
Tim Romanski
ReLM
LRM
43
3
0
05 Apr 2025
CrystalFormer-RL: Reinforcement Fine-Tuning for Materials Design
CrystalFormer-RL: Reinforcement Fine-Tuning for Materials Design
Zhendong Cao
Lei Wang
AI4CE
33
1
0
03 Apr 2025
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
Yan Ma
Steffi Chern
Xuyang Shen
Yiran Zhong
Pengfei Liu
OffRL
LRM
43
1
0
03 Apr 2025
How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study
How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study
Yunjie Ji
Sitong Zhao
Xiaoyu Tian
Haotian Wang
Shuaiting Chen
Yiping Peng
Han Zhao
Xiangang Li
LRM
44
2
0
01 Apr 2025
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Xiaoye Qu
Yafu Li
Zhaochen Su
Weigao Sun
Jianhao Yan
...
Chaochao Lu
Yue Zhang
Xian-Sheng Hua
Bowen Zhou
Yu Cheng
ReLM
OffRL
LRM
80
14
0
27 Mar 2025
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng
Yuzhen Huang
Qian Liu
Wei Liu
Keqing He
Zejun Ma
Junxian He
OffRL
ReLM
LRM
91
31
0
24 Mar 2025
1