10
0

What Does Neuro Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs

Abstract

In this paper, we introduce S-MedQA, an English medical question-answering (QA) dataset for benchmarking large language models in fine-grained clinical specialties. We use S-MedQA to check the applicability of a popular hypothesis related to knowledge injection in the knowledge-intense scenario of medical QA, and show that: 1) training on data from a speciality does not necessarily lead to best performance on that specialty and 2) regardless of the specialty fine-tuned on, token probabilities of clinically relevant terms for all specialties increase consistently. Thus, we believe improvement gains come mostly from domain shifting (e.g., general to medical) rather than knowledge injection and suggest rethinking the role of fine-tuning data in the medical domain. We release S-MedQA and all code needed to reproduce all our experiments to the research community.

View on arXiv
@article{yan2025_2505.10113,
  title={ What Does Neuro Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs },
  author={ Xinlan Yan and Di Wu and Yibin Lei and Christof Monz and Iacer Calixto },
  journal={arXiv preprint arXiv:2505.10113},
  year={ 2025 }
}
Comments on this paper