ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

11 June 2025

Main:10 Pages

7 Figures

Bibliography:2 Pages

8 Tables

Appendix:12 Pages

Abstract

Though reasoning-based large language models (LLMs) have excelled in mathematics and programming, their capabilities in knowledge-intensive medical question answering remain underexplored. To address this, we introduce ReasonMed, the largest medical reasoning dataset, comprising 370k high-quality examples distilled from 1.7 million initial reasoning paths generated by various LLMs. ReasonMed is constructed through a \textit{multi-agent verification and refinement process}, where we design an \textit{Error Refiner} to enhance the reasoning paths by identifying and correcting error-prone steps flagged by a verifier. Leveraging ReasonMed, we systematically investigate best practices for training medical reasoning models and find that combining detailed Chain-of-Thought (CoT) reasoning with concise answer summaries yields the most effective fine-tuning strategy. Based on this strategy, we train ReasonMed-7B, which sets a new benchmark for sub-10B models, outperforming the prior best by 4.17\% and even exceeding LLaMA3.1-70B on PubMedQA by 4.60\%.

View on arXiv

@article{sun2025_2506.09513,
  title={ ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning },
  author={ Yu Sun and Xingyu Qian and Weiwen Xu and Hao Zhang and Chenghao Xiao and Long Li and Yu Rong and Wenbing Huang and Qifeng Bai and Tingyang Xu },
  journal={arXiv preprint arXiv:2506.09513},
  year={ 2025 }
}

Comments on this paper