ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.21318
59
48

Phi-4-reasoning Technical Report

30 April 2025
Marah Abdin
Sahaj Agarwal
Ahmed Hassan Awadallah
Vidhisha Balachandran
Harkirat Singh Behl
Lingjiao Chen
Gustavo de Rosa
Suriya Gunasekar
Mojan Javaheripi
Neel Joshi
Piero Kauffmann
Yash Lara
C. C. T. Mendes
Arindam Mitra
Besmira Nushi
Dimitris Papailiopoulos
Olli Saarikivi
S. Shah
Vaishnavi Shrivastava
Vibhav Vineet
Yue Wu
Safoora Yousefi
Guoqing Zheng
    ReLM
    LRM
ArXivPDFHTML
Abstract

We introduce Phi-4-reasoning, a 14-billion parameter reasoning model that achieves strong performance on complex reasoning tasks. Trained via supervised fine-tuning of Phi-4 on carefully curated set of "teachable" prompts-selected for the right level of complexity and diversity-and reasoning demonstrations generated using o3-mini, Phi-4-reasoning generates detailed reasoning chains that effectively leverage inference-time compute. We further develop Phi-4-reasoning-plus, a variant enhanced through a short phase of outcome-based reinforcement learning that offers higher performance by generating longer reasoning traces. Across a wide range of reasoning tasks, both models outperform significantly larger open-weight models such as DeepSeek-R1-Distill-Llama-70B model and approach the performance levels of full DeepSeek-R1 model. Our comprehensive evaluations span benchmarks in math and scientific reasoning, coding, algorithmic problem solving, planning, and spatial understanding. Interestingly, we observe a non-trivial transfer of improvements to general-purpose benchmarks as well. In this report, we provide insights into our training data, our training methodologies, and our evaluations. We show that the benefit of careful data curation for supervised fine-tuning (SFT) extends to reasoning language models, and can be further amplified by reinforcement learning (RL). Finally, our evaluation points to opportunities for improving how we assess the performance and robustness of reasoning models.

View on arXiv
@article{abdin2025_2504.21318,
  title={ Phi-4-reasoning Technical Report },
  author={ Marah Abdin and Sahaj Agarwal and Ahmed Awadallah and Vidhisha Balachandran and Harkirat Behl and Lingjiao Chen and Gustavo de Rosa and Suriya Gunasekar and Mojan Javaheripi and Neel Joshi and Piero Kauffmann and Yash Lara and Caio César Teodoro Mendes and Arindam Mitra and Besmira Nushi and Dimitris Papailiopoulos and Olli Saarikivi and Shital Shah and Vaishnavi Shrivastava and Vibhav Vineet and Yue Wu and Safoora Yousefi and Guoqing Zheng },
  journal={arXiv preprint arXiv:2504.21318},
  year={ 2025 }
}
Comments on this paper