185
v1v2 (latest)

Dialogue is Better Than Monologue: Instructing Medical LLMs via Strategical Conversations

Main:8 Pages
8 Figures
Bibliography:2 Pages
3 Tables
Appendix:5 Pages
Abstract

Current medical AI systems often fail to replicate real-world clinical reasoning, as they are predominantly trained and evaluated on static text and question-answer tasks. These tuning methods and benchmarks overlook critical aspects like evidence-based reasoning and handling distracting information. To bridge this gap, we introduce a novel benchmark that simulates real-world diagnostic scenarios, integrating noise and difficulty levels aligned with USMLE standards. Moreover, we explore dialogue-based fine-tuning, which transforms static datasets into conversational formats to better capture iterative reasoning processes. Experiments show that dialogue-tuned models outperform traditional methods, with improvements of 9.64%9.64\% in multi-round reasoning scenarios and 6.18%6.18\% in accuracy in a noisy environment. Our findings highlight dialogue tuning as a promising approach for advancing clinically aligned and robust medical AI systems.

View on arXiv
Comments on this paper