ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.19655
71
2

Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning

27 February 2025
Sheng Zhang
Qianchu Liu
Guanghui Qin
Tristan Naumann
Hoifung Poon
    ReLM
    OffRL
    LRM
ArXivPDFHTML
Abstract

Reinforcement learning from verifiable rewards (RLVR) has recently gained attention for its ability to elicit self-evolved reasoning capabilitie from base language models without explicit reasoning supervisions, as demonstrated by DeepSeek-R1. While prior work on RLVR has primarily focused on mathematical and coding domains, its applicability to other tasks and domains remains unexplored. In this work, we investigate whether medical reasoning can emerge from RLVR. We introduce Med-RLVR as an initial study of RLVR in the medical domain leveraging medical multiple-choice question answering (MCQA) data as verifiable labels. Our results demonstrate that RLVR is not only effective for math and coding but also extends successfully to medical question answering. Notably, Med-RLVR achieves performance comparable to traditional supervised fine-tuning (SFT) on in-distribution tasks while significantly improving out-of-distribution generalization, with an 8-point accuracy gain. Further analysis of training dynamics reveals that, with no explicit reasoning supervision, reasoning emerges from the 3B-parameter base model. These findings underscore the potential of RLVR in domains beyond math and coding, opening new avenues for its application in knowledge-intensive fields such as medicine.

View on arXiv
@article{zhang2025_2502.19655,
  title={ Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning },
  author={ Sheng Zhang and Qianchu Liu and Guanghui Qin and Tristan Naumann and Hoifung Poon },
  journal={arXiv preprint arXiv:2502.19655},
  year={ 2025 }
}
Comments on this paper