ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.11462
22
0

Disentangling Reasoning and Knowledge in Medical Large Language Models

16 May 2025
Rahul Thapa
Qingyang Wu
Kevin Wu
Harrison Zhang
Angela Zhang
Eric Wu
Haotian Ye
Suhana Bedi
Nevin Aresh
Joseph Boen
Shriya Reddy
Ben Athiwaratkun
Shuaiwen Leon Song
James Zou
    ELM
    AI4MH
    LM&MA
    LRM
ArXivPDFHTML
Abstract

Medical reasoning in large language models (LLMs) aims to emulate clinicians' diagnostic thinking, but current benchmarks such as MedQA-USMLE, MedMCQA, and PubMedQA often mix reasoning with factual recall. We address this by separating 11 biomedical QA benchmarks into reasoning- and knowledge-focused subsets using a PubMedBERT classifier that reaches 81 percent accuracy, comparable to human performance. Our analysis shows that only 32.8 percent of questions require complex reasoning. We evaluate biomedical models (HuatuoGPT-o1, MedReason, m1) and general-domain models (DeepSeek-R1, o4-mini, Qwen3), finding consistent gaps between knowledge and reasoning performance. For example, m1 scores 60.5 on knowledge but only 47.1 on reasoning. In adversarial tests where models are misled with incorrect initial reasoning, biomedical models degrade sharply, while larger or RL-trained general models show more robustness. To address this, we train BioMed-R1 using fine-tuning and reinforcement learning on reasoning-heavy examples. It achieves the strongest performance among similarly sized models. Further gains may come from incorporating clinical case reports and training with adversarial and backtracking scenarios.

View on arXiv
@article{thapa2025_2505.11462,
  title={ Disentangling Reasoning and Knowledge in Medical Large Language Models },
  author={ Rahul Thapa and Qingyang Wu and Kevin Wu and Harrison Zhang and Angela Zhang and Eric Wu and Haotian Ye and Suhana Bedi and Nevin Aresh and Joseph Boen and Shriya Reddy and Ben Athiwaratkun and Shuaiwen Leon Song and James Zou },
  journal={arXiv preprint arXiv:2505.11462},
  year={ 2025 }
}
Comments on this paper