An Empirical Study on Eliciting and Improving R1-like Reasoning Models

6 March 2025

Abstract

In this report, we present the third technical report on the development of slow-thinking models as part of the STILL project. As the technical pathway becomes clearer, scaling RL training has become a central technique for implementing such reasoning models. We systematically experiment with and document the effects of various factors influencing RL training, conducting experiments on both base models and fine-tuned models. Specifically, we demonstrate that our RL training approach consistently improves the Qwen2.5-32B base models, enhancing both response length and test accuracy. Furthermore, we show that even when a model like DeepSeek-R1-Distill-Qwen-1.5B has already achieved a high performance level, it can be further refined through RL training, reaching an accuracy of 39.33% on AIME 2024. Beyond RL training, we also explore the use of tool manipulation, finding that it significantly boosts the reasoning performance of large reasoning models. This approach achieves a remarkable accuracy of 86.67% with greedy search on AIME 2024, underscoring its effectiveness in enhancing model capabilities. We release our resources at the STILL project website:this https URL.

View on arXiv

@article{chen2025_2503.04548,
  title={ An Empirical Study on Eliciting and Improving R1-like Reasoning Models },
  author={ Zhipeng Chen and Yingqian Min and Beichen Zhang and Jie Chen and Jinhao Jiang and Daixuan Cheng and Wayne Xin Zhao and Zheng Liu and Xu Miao and Yang Lu and Lei Fang and Zhongyuan Wang and Ji-Rong Wen },
  journal={arXiv preprint arXiv:2503.04548},
  year={ 2025 }
}

Comments on this paper