Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2406.06331
Cited By

MedExQA: Medical Question Answering Benchmark with Multiple Explanations

v1v2 (latest)

MedExQA: Medical Question Answering Benchmark with Multiple Explanations

10 June 2024

ArXiv (abs)PDF HTML Github (4★)

Papers citing "MedExQA: Medical Question Answering Benchmark with Multiple Explanations"

29 / 29 papers shown

Safer in Translation? Presupposition Robustness in Indic Languages

Safer in Translation? Presupposition Robustness in Indic Languages

127

0

0

03 Nov 2025

CGBench: Benchmarking Language Model Scientific Reasoning for Clinical Genetics Research

CGBench: Benchmarking Language Model Scientific Reasoning for Clinical Genetics Research

196

0

0

13 Oct 2025

Inflated Excellence or True Performance? Rethinking Medical Diagnostic Benchmarks with Dynamic Evaluation

Inflated Excellence or True Performance? Rethinking Medical Diagnostic Benchmarks with Dynamic Evaluation

220

1

0

10 Oct 2025

Risk Profiling and Modulation for LLMs

Risk Profiling and Modulation for LLMs

193

1

0

27 Sep 2025

Filling in the Clinical Gaps in Benchmark: Case for HealthBench for the Japanese medical system

Filling in the Clinical Gaps in Benchmark: Case for HealthBench for the Japanese medical system

229

0

0

22 Sep 2025

MedBench-IT: A Comprehensive Benchmark for Evaluating Large Language Models on Italian Medical Entrance Examinations

MedBench-IT: A Comprehensive Benchmark for Evaluating Large Language Models on Italian Medical Entrance Examinations

Ruggero Marino Lazzaroni

Alessandro Angioi

Michelangelo Puliga

178

1

0

08 Sep 2025

Benchmarking for Domain-Specific LLMs: A Case Study on Academia and Beyond

Benchmarking for Domain-Specific LLMs: A Case Study on Academia and Beyond

324

0

0

10 Aug 2025

Beyond the Leaderboard: Rethinking Medical Benchmarks for Large Language Models

Beyond the Leaderboard: Rethinking Medical Benchmarks for Large Language Models

LM&MA ELM AI4MH

273

4

0

06 Aug 2025

It's Not the Target, It's the Background: Rethinking Infrared Small Target Detection via Deep Patch-Free Low-Rank Representations

It's Not the Target, It's the Background: Rethinking Infrared Small Target Detection via Deep Patch-Free Low-Rank RepresentationsIEEE Transactions on Geoscience and Remote Sensing (IEEE TGRS), 2025

650

0

0

12 Jun 2025

MIRIAD: Augmenting LLMs with millions of medical query-response pairs

MIRIAD: Augmenting LLMs with millions of medical query-response pairs

Salman Abdullah

Sophie Ostmeier

Maximilian Purk

337

6

0

06 Jun 2025

High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning

High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning

Archie Sravankumar

Luke Zettlemoyer

296

0

0

04 Jun 2025

Trustworthy Medical Question Answering: An Evaluation-Centric Survey

Trustworthy Medical Question Answering: An Evaluation-Centric Survey

Robert E. Mercer

Sudipta Singha Roy

Sudipta Singha Roy

289

6

0

04 Jun 2025

Beyond Memorization: A Rigorous Evaluation Framework for Medical Knowledge Editing

Beyond Memorization: A Rigorous Evaluation Framework for Medical Knowledge Editing

481

2

0

04 Jun 2025

BioHopR: A Benchmark for Multi-Hop, Multi-Answer Reasoning in Biomedical Domain

BioHopR: A Benchmark for Multi-Hop, Multi-Answer Reasoning in Biomedical DomainAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

190

9

0

28 May 2025

PerMedCQA: Benchmarking Large Language Models on Medical Consumer Question Answering in Persian Language

PerMedCQA: Benchmarking Large Language Models on Medical Consumer Question Answering in Persian Language

Milad Mohammadi

277

1

0

23 May 2025

TAGS: A Test-Time Generalist-Specialist Framework with Retrieval-Augmented Reasoning and Verification

TAGS: A Test-Time Generalist-Specialist Framework with Retrieval-Augmented Reasoning and Verification

207

2

0

23 May 2025

Continually Self-Improving Language Models for Bariatric Surgery Question--Answering

Continually Self-Improving Language Models for Bariatric Surgery Question--Answering

Yash Kumar Atri

Thomas Hartvigsen

306

1

0

22 May 2025

NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context

NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context

456

0

0

13 May 2025

TeleEval-OS: Performance evaluations of large language models for operations scheduling

TeleEval-OS: Performance evaluations of large language models for operations scheduling

...

201

0

0

06 May 2025

A Case Study Exploring the Current Landscape of Synthetic Medical Record Generation with Commercial LLMs

A Case Study Exploring the Current Landscape of Synthetic Medical Record Generation with Commercial LLMsACM Conference on Health, Inference, and Learning (CHIL), 2025

Zhirong Bella Yu

504

9

0

20 Apr 2025

DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain

DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain

...

457

0

0

18 Apr 2025

IHC-LLMiner: Automated extraction of tumour immunohistochemical profiles from PubMed abstracts using large language models

IHC-LLMiner: Automated extraction of tumour immunohistochemical profiles from PubMed abstracts using large language models

Michal W. S. Ong

Daniel W. Rogalsky

Manuel Rodriguez-Justo

218

0

0

01 Apr 2025

3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark

3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark

Amina Miftakhova

Artemiy Tereshchenko

Andrey Savchenko

417

5

0

26 Mar 2025

MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning

MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning

...

Mark B. Gerstein

AI4MH LRM ELM LM&MA

368

35

0

10 Mar 2025

Application of integrated gradients explainability to sociopsychological semantic markers

Application of integrated gradients explainability to sociopsychological semantic markers

Magdalena Formanowicz

Maria Laura Bettinsoli

Caterina Suitner

266

1

0

06 Mar 2025

From Retrieval to Generation: Comparing Different Approaches

From Retrieval to Generation: Comparing Different Approaches

Abdelrahman Abdallah

Jamshid Mozafari

385

4

0

27 Feb 2025

A Benchmark for Long-Form Medical Question Answering

A Benchmark for Long-Form Medical Question Answering

Pedram Hosseini

Bryceton G. Thomas

Saeed Hassanpour

ELM LM&MA AI4MH

310

20

0

14 Nov 2024

Evidence Is All You Need: Ordering Imaging Studies via Language Model
Alignment with the ACR Appropriateness Criteria

Evidence Is All You Need: Ordering Imaging Studies via Language Model Alignment with the ACR Appropriateness CriteriaCommunications Medicine (Commun Med), 2024

Charles E. Kahn Jr.

342

0

0

27 Sep 2024

Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions

Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions

571

110

0

28 Feb 2024

Page 1 of 1