ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.06331
  4. Cited By
MedExQA: Medical Question Answering Benchmark with Multiple Explanations
v1v2 (latest)

MedExQA: Medical Question Answering Benchmark with Multiple Explanations

10 June 2024
Yunsoo Kim
Jinge Wu
Yusuf Abdulle
Honghan Wu
    ELM
ArXiv (abs)PDFHTMLGithub (4★)

Papers citing "MedExQA: Medical Question Answering Benchmark with Multiple Explanations"

29 / 29 papers shown
Safer in Translation? Presupposition Robustness in Indic Languages
Safer in Translation? Presupposition Robustness in Indic Languages
Aadi Palnitkar
Arjun Suresh
Rishi Rajesh
Puneet Puli
127
0
0
03 Nov 2025
CGBench: Benchmarking Language Model Scientific Reasoning for Clinical Genetics Research
CGBench: Benchmarking Language Model Scientific Reasoning for Clinical Genetics Research
Owen Queen
Harrison Zhang
James Zou
ELMLM&MALRM
196
0
0
13 Oct 2025
Inflated Excellence or True Performance? Rethinking Medical Diagnostic Benchmarks with Dynamic Evaluation
Inflated Excellence or True Performance? Rethinking Medical Diagnostic Benchmarks with Dynamic Evaluation
Xiangxu Zhang
Lei Li
Yanyun Zhou
Xiao Zhou
Y. Zhang
Xian Wu
LM&MAELM
220
1
0
10 Oct 2025
Risk Profiling and Modulation for LLMs
Risk Profiling and Modulation for LLMs
Yikai Wang
Xiaocheng Li
Guanting Chen
193
1
0
27 Sep 2025
Filling in the Clinical Gaps in Benchmark: Case for HealthBench for the Japanese medical system
Filling in the Clinical Gaps in Benchmark: Case for HealthBench for the Japanese medical system
Shohei Hisada
Endo Sunao
Himi Yamato
Shoko Wakamiya
Eiji Aramaki
ELM
229
0
0
22 Sep 2025
MedBench-IT: A Comprehensive Benchmark for Evaluating Large Language Models on Italian Medical Entrance Examinations
MedBench-IT: A Comprehensive Benchmark for Evaluating Large Language Models on Italian Medical Entrance Examinations
Ruggero Marino Lazzaroni
Alessandro Angioi
Michelangelo Puliga
Davide Sanna
Roberto Marras
LM&MAELM
178
1
0
08 Sep 2025
Benchmarking for Domain-Specific LLMs: A Case Study on Academia and Beyond
Benchmarking for Domain-Specific LLMs: A Case Study on Academia and Beyond
Rubing Chen
Jiaxin Wu
Jian Wang
Xulu Zhang
Wenqi Fan
Chenghua Lin
Xiao-Yong Wei
Qing Li
ALM
324
0
0
10 Aug 2025
Beyond the Leaderboard: Rethinking Medical Benchmarks for Large Language Models
Beyond the Leaderboard: Rethinking Medical Benchmarks for Large Language Models
Zizhan Ma
Wenxuan Wang
G. Yu
Yiu-Fai Cheung
Meidan Ding
J. Tang
Wenting Chen
LinLin Shen
LM&MAELMAI4MH
273
4
0
06 Aug 2025
It's Not the Target, It's the Background: Rethinking Infrared Small Target Detection via Deep Patch-Free Low-Rank Representations
It's Not the Target, It's the Background: Rethinking Infrared Small Target Detection via Deep Patch-Free Low-Rank RepresentationsIEEE Transactions on Geoscience and Remote Sensing (IEEE TGRS), 2025
Guoyi Zhang
Guangsheng Xu
Siyang Chen
Han Wang
Xiaohu Zhang
650
0
0
12 Jun 2025
MIRIAD: Augmenting LLMs with millions of medical query-response pairs
MIRIAD: Augmenting LLMs with millions of medical query-response pairs
Qinyue Zheng
Salman Abdullah
Sam Rawal
C. Zakka
Sophie Ostmeier
Maximilian Purk
E. Reis
Eric J. Topol
J. Leskovec
Michael Moor
LM&MAAI4MH
337
6
0
06 Jun 2025
High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning
High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning
Tim Franzmeyer
Archie Sravankumar
Lijuan Liu
Yuning Mao
Rui Hou
Sinong Wang
Jakob Foerster
Luke Zettlemoyer
Madian Khabsa
KELMALM
296
0
0
04 Jun 2025
Trustworthy Medical Question Answering: An Evaluation-Centric Survey
Trustworthy Medical Question Answering: An Evaluation-Centric Survey
Yinuo Wang
Robert E. Mercer
Frank Rudzicz
Sudipta Singha Roy
Sudipta Singha Roy
Pengjie Ren
Zhumin Chen
Xindi Wang
ELM
289
6
0
04 Jun 2025
Beyond Memorization: A Rigorous Evaluation Framework for Medical Knowledge Editing
Beyond Memorization: A Rigorous Evaluation Framework for Medical Knowledge Editing
Shigeng Chen
Linhao Luo
Zhangchi Qiu
Yanan Cao
Carl Yang
Shirui Pan
KELM
481
2
0
04 Jun 2025
BioHopR: A Benchmark for Multi-Hop, Multi-Answer Reasoning in Biomedical Domain
BioHopR: A Benchmark for Multi-Hop, Multi-Answer Reasoning in Biomedical DomainAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yunsoo Kim
Yusuf Abdulle
Honghan Wu
LRM
190
9
0
28 May 2025
PerMedCQA: Benchmarking Large Language Models on Medical Consumer Question Answering in Persian Language
PerMedCQA: Benchmarking Large Language Models on Medical Consumer Question Answering in Persian Language
Naghmeh Jamali
Milad Mohammadi
Danial Baledi
Zahra Rezvani
Hesham Faili
LM&MAELM
277
1
0
23 May 2025
TAGS: A Test-Time Generalist-Specialist Framework with Retrieval-Augmented Reasoning and Verification
TAGS: A Test-Time Generalist-Specialist Framework with Retrieval-Augmented Reasoning and Verification
Jianghao Wu
Feilong Tang
Yulong Li
Ming Hu
Haochen Xue
Shoaib Jameel
Yutong Xie
Imran Razzak
LRM
207
2
0
23 May 2025
Continually Self-Improving Language Models for Bariatric Surgery Question--Answering
Continually Self-Improving Language Models for Bariatric Surgery Question--Answering
Yash Kumar Atri
Thomas H Shin
Thomas Hartvigsen
306
1
0
22 May 2025
NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context
NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context
Ben Yao
Qiuchi Li
Yazhou Zhang
Siyu Yang
Bohan Zhang
Prayag Tiwari
Jing Qin
456
0
0
13 May 2025
TeleEval-OS: Performance evaluations of large language models for operations scheduling
TeleEval-OS: Performance evaluations of large language models for operations scheduling
Yanyan Wang
Yingying Wang
Junli Liang
Yin Xu
Yunlong Liu
...
Fei Li
Long Zhao
Kuang Xu
Qi Song
Xiangyang Li
AI4TS
201
0
0
06 May 2025
A Case Study Exploring the Current Landscape of Synthetic Medical Record Generation with Commercial LLMs
A Case Study Exploring the Current Landscape of Synthetic Medical Record Generation with Commercial LLMsACM Conference on Health, Inference, and Learning (CHIL), 2025
Yihan Lin
Zhirong Bella Yu
Simon Lee
SyDa
504
9
0
20 Apr 2025
DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain
DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain
Enhao Huang
Rainy Sun
Anya Reese
Alex Chen
Alex Chen
...
Frank Li
Hobert Wong
Gang Zhao
Ziang Ling
Lowes Yang
ALMELM
457
0
0
18 Apr 2025
IHC-LLMiner: Automated extraction of tumour immunohistochemical profiles from PubMed abstracts using large language models
IHC-LLMiner: Automated extraction of tumour immunohistochemical profiles from PubMed abstracts using large language models
Yunsoo Kim
Michal W. S. Ong
Daniel W. Rogalsky
Manuel Rodriguez-Justo
Honghan Wu
Adam P. Levine
218
0
0
01 Apr 2025
3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark
3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark
Ivan Sviridov
Amina Miftakhova
Artemiy Tereshchenko
Galina Zubkova
Pavel Blinov
Andrey Savchenko
LM&MA
417
5
0
26 Mar 2025
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning
Xiangru Tang
Daniel Shao
Jiwoong Sohn
Jiapeng Chen
Jiayi Zhang
...
Yilun Zhao
Chenglin Wu
Wenqi Shi
Arman Cohan
Mark B. Gerstein
AI4MHLRMELMLM&MA
368
35
0
10 Mar 2025
Application of integrated gradients explainability to sociopsychological semantic markers
Application of integrated gradients explainability to sociopsychological semantic markers
Ali Aghababaei
Jan Nikadon
Magdalena Formanowicz
Maria Laura Bettinsoli
Carmen Cervone
Caterina Suitner
Tomaso Erseghe
266
1
0
06 Mar 2025
From Retrieval to Generation: Comparing Different Approaches
From Retrieval to Generation: Comparing Different Approaches
Abdelrahman Abdallah
Jamshid Mozafari
Bhawna Piryani
Mohammed Ali
Adam Jatowt
RALM
385
4
0
27 Feb 2025
A Benchmark for Long-Form Medical Question Answering
A Benchmark for Long-Form Medical Question Answering
Pedram Hosseini
Jessica M. Sin
Bing Ren
Bryceton G. Thomas
Elnaz Nouri
Ali Farahanchi
Saeed Hassanpour
ELMLM&MAAI4MH
310
20
0
14 Nov 2024
Evidence Is All You Need: Ordering Imaging Studies via Language Model
  Alignment with the ACR Appropriateness Criteria
Evidence Is All You Need: Ordering Imaging Studies via Language Model Alignment with the ACR Appropriateness CriteriaCommunications Medicine (Commun Med), 2024
Michael S. Yao
Allison Chae
Charles E. Kahn Jr.
W. Witschey
James C. Gee
H. Sagreiya
Osbert Bastani
342
0
0
27 Sep 2024
Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions
Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions
Hanjie Chen
Zhouxiang Fang
Yash Singla
Mark Dredze
ELMAI4MH
571
110
0
28 Feb 2024
1
Page 1 of 1