Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1906.04701
Cited By
HEAD-QA: A Healthcare Dataset for Complex Reasoning
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
11 June 2019
David Vilares
Carlos Gómez-Rodríguez
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"HEAD-QA: A Healthcare Dataset for Complex Reasoning"
50 / 72 papers shown
OctoMed: Data Recipes for State-of-the-Art Multimodal Medical Reasoning
Timothy Ossowski
Sheng Zhang
Qianchu Liu
Guanghui Qin
Reuben Tan
Tristan Naumann
Junjie Hu
Hoifung Poon
LRM
283
2
0
28 Nov 2025
Structured Prompts Improve Evaluation of Language Models
Asad Aali
Muhammad Ahmed Mohsin
Vasiliki Bikia
Arnav Singhvi
Richard Gaus
...
Sanmi Koyejo
Emily Alsentzer
Christopher Potts
N. Shah
Akshay Chaudhari
ELM
LRM
326
1
0
25 Nov 2025
HEAD-QA v2: Expanding a Healthcare Benchmark for Reasoning
Alexis Correa-Guillén
Carlos Gómez-Rodríguez
David Vilares
CML
ELM
LRM
323
0
0
19 Nov 2025
IMB: An Italian Medical Benchmark for Question Answering
Antonio Romano
Giuseppe Riccio
Mariano Barone
Marco Postiglione
V. Moscato
AI4MH
266
1
0
21 Oct 2025
Rethinking RL Evaluation: Can Benchmarks Truly Reveal Failures of RL Methods?
Z. Chen
Yiming Zhang
Hengguang Zhou
Zenghui Ding
Yining Sun
Cho-Jui Hsieh
OffRL
ALM
ELM
132
0
0
12 Oct 2025
From Guidelines to Guarantees: A Graph-Based Evaluation Harness for Domain-Specific Evaluation of LLMs
Jessica Lundin
Guillaume Chabot-Couture
Guillaume Chabot-Couture
ELM
136
1
0
28 Aug 2025
Proximal Supervised Fine-Tuning
Wenhong Zhu
Ruobing Xie
R. Wang
Xingwu Sun
Di Wang
Pengfei Liu
OffRL
149
4
0
25 Aug 2025
HealthBranches: Synthesizing Clinically-Grounded Question Answering Datasets via Decision Pathways
Cristian Cosentino
Annamaria Defilippo
Marco Dossena
Christopher Irwin
Sara Joubbi
Pietro Lio'
LM&MA
AI4MH
165
1
0
10 Aug 2025
Beyond the Leaderboard: Rethinking Medical Benchmarks for Large Language Models
Zizhan Ma
Wenxuan Wang
G. Yu
Yiu-Fai Cheung
Meidan Ding
J. Tang
Wenting Chen
LinLin Shen
LM&MA
ELM
AI4MH
269
3
0
06 Aug 2025
MedVLThinker: Simple Baselines for Multimodal Medical Reasoning
Xiaoke Huang
Juncheng Wu
Hui Liu
Xianfeng Tang
Yuyin Zhou
ReLM
LRM
327
15
0
04 Aug 2025
FPEdit: Robust LLM Fingerprinting through Localized Parameter Editing
Shida Wang
Chaohu Liu
Yubo Wang
Linli Xu
KELM
296
3
0
04 Aug 2025
MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation
Adrien Bazoge
ELM
210
3
0
28 Jul 2025
A Hybrid Early-Exit Algorithm for Large Language Models Based on Space Alignment Decoding (SPADE)
Bowen Zheng
Ming Ma
Zhongqiao Lin
Tianming Yang
179
1
0
23 Jul 2025
Train-before-Test Harmonizes Language Model Rankings
Guanhua Zhang
Ricardo Dominguez-Olmedo
Moritz Hardt
ALM
265
6
0
07 Jul 2025
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
Maggie Huan
Yuetai Li
Tuney Zheng
Xiaoyu Xu
Seungone Kim
Minxin Du
Radha Poovendran
Graham Neubig
Xiang Yue
LRM
ELM
241
67
0
01 Jul 2025
LASER: Stratified Selective Sampling for Instruction Tuning with Dedicated Scoring Strategy
Paramita Mirza
Lucas Weber
Fabian Küch
415
0
0
28 May 2025
Research Community Perspectives on "Intelligence" and Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Bertram Højer
Terne Sasha Thorn Jakobsen
Anna Rogers
Stefan Heinrich
228
3
0
27 May 2025
Disentangling Reasoning and Knowledge in Medical Large Language Models
Rahul Thapa
Qingyang Wu
Kevin Wu
Harrison Zhang
Angela Zhang
...
Joseph Boen
Shriya Reddy
Ben Athiwaratkun
Shuaiwen Leon Song
James Zou
ELM
AI4MH
LM&MA
LRM
473
12
0
16 May 2025
GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning
Jian Zhao
Runze Liu
Kaiyan Zhang
Zhimu Zhou
Junqi Gao
...
Jiafei Lyu
Zhouyi Qian
Biqing Qi
Xiu Li
Bowen Zhou
OffRL
LRM
524
28
0
01 Apr 2025
Advancing Problem-Based Learning in Biomedical Engineering in the Era of Generative AI
Micky C. Nnamdi
J. Ben Tamo
Wenqi Shi
Hang Wu
May D. Wang
AI4CE
314
3
0
20 Mar 2025
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning
Xiangru Tang
Daniel Shao
Jiwoong Sohn
Jiapeng Chen
Jiayi Zhang
...
Yilun Zhao
Chenglin Wu
Wenqi Shi
Arman Cohan
Mark B. Gerstein
AI4MH
LRM
ELM
LM&MA
365
34
0
10 Mar 2025
BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning
Haiteng Zhao
Chang Ma
FangZhi Xu
Lingpeng Kong
Zhi-Hong Deng
LRM
583
11
0
23 Feb 2025
WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
João Matos
Shan Chen
Siena Placino
Yingya Li
Juan Carlos Climent Pardo
...
Hugo J. W. L. Aerts
Leo Anthony Celi
A. I. Wong
Danielle S. Bitterman
Jack Gallifant
229
9
0
16 Oct 2024
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts
International Conference on Learning Representations (ICLR), 2024
Guorui Zheng
Xidong Wang
Juhao Liang
Nuo Chen
Yuping Zheng
Benyou Wang
MoE
361
11
0
14 Oct 2024
CasiMedicos-Arg: A Medical Question Answering Dataset Annotated with Explanatory Argumentative Structures
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Ekaterina Sviridova
Anar Yeginbergen
A. Estarrona
Elena Cabrio
S. Villata
Rodrigo Agerri
309
10
0
07 Oct 2024
Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Clément Christophe
Tathagata Raha
Svetlana Maslenkova
Muhammad Umar Salman
Praveen K Kanithi
Marco AF Pimentel
Shadab Khan
LM&MA
265
4
0
23 Sep 2024
Med42-v2: A Suite of Clinical LLMs
Clément Christophe
Praveen K Kanithi
Tathagata Raha
Shadab Khan
Marco AF Pimentel
ELM
LM&MA
AI4MH
338
78
0
12 Aug 2024
Data Contamination Report from the 2024 CONDA Shared Task
Oscar Sainz
Iker García-Ferrero
Alon Jacovi
Jonas Hanselle
Yanai Elazar
...
Yu-Min Tseng
Vishaal Udandarao
Zengzhi Wang
Ruijie Xu
Jinglin Yang
320
18
0
31 Jul 2024
CollectiveSFT: Scaling Large Language Models for Chinese Medical Benchmark with Collective Instructions in Healthcare
Jingwei Zhu
Minghuan Tan
Min Yang
Ruixue Li
Hamid Alinejad-Rokny
ALM
LM&MA
277
1
0
29 Jul 2024
Stay Tuned: An Empirical Study of the Impact of Hyperparameters on LLM Tuning in Real-World Applications
Alon Halfon
Shai Gretz
Ofir Arviv
Artem Spector
Orith Toledo-Ronen
Yoav Katz
L. Ein-Dor
Michal Shmueli-Scheuer
Noam Slonim
286
8
0
25 Jul 2024
M-QALM: A Benchmark to Assess Clinical Reading Comprehension and Knowledge Recall in Large Language Models via Question Answering
Anand Subramanian
Viktor Schlegel
Abhinav Ramesh Kashyap
Thanh-Tung Nguyen
Vijay Prakash Dwivedi
Stefan Winkler
ELM
LM&MA
AI4MH
196
6
0
06 Jun 2024
Med42 -- Evaluating Fine-Tuning Strategies for Medical LLMs: Full-Parameter vs. Parameter-Efficient Approaches
Clément Christophe
Praveen K Kanithi
Prateek Munjal
Tathagata Raha
Nasir Hayat
...
Charles Chen
Natalia Vassilieva
Boulbaba Ben Amor
Marco AF Pimentel
Shadab Khan
AI4MH
LM&MA
265
70
0
23 Apr 2024
CT-Agent: Clinical Trial Multi-Agent with Large Language Model-based Reasoning
Ling Yue
Tianfan Fu
LLMAG
LRM
ELM
181
35
0
23 Apr 2024
SciDaSynth: Interactive Structured Data Extraction from Scientific Literature with Large Language Model
Campbell Systematic Reviews (Campbell Syst Rev), 2024
Xingbo Wang
S. Huey
Rui Sheng
Saurabh Mehta
Fei Wang
407
4
0
21 Apr 2024
Improving Health Question Answering with Reliable and Time-Aware Evidence Retrieval
Juraj Vladika
Florian Matthes
RALM
277
14
0
12 Apr 2024
MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering
Inigo Alonso
Maite Oronoz
Rodrigo Agerri
AI4MH
LM&MA
ELM
268
67
1
08 Apr 2024
Apollo: A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People
Xidong Wang
Nuo Chen
Junying Chen
Yan Hu
Yidong Wang
Xiangbo Wu
Anningzhe Gao
Xiang Wan
Haizhou Li
Benyou Wang
LM&MA
348
46
0
06 Mar 2024
KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations
Sunjun Kweon
B. Choi
Minkyu Kim
Rae Woong Park
Edward Choi
ELM
220
20
0
03 Mar 2024
Towards Building Multilingual Language Model for Medicine
Pengcheng Qiu
Chaoyi Wu
Xiaoman Zhang
Weixiong Lin
Haicheng Wang
Ya Zhang
Yanfeng Wang
Weidi Xie
LM&MA
ELM
549
172
0
21 Feb 2024
OLMo: Accelerating the Science of Language Models
Dirk Groeneveld
Iz Beltagy
Pete Walsh
Akshita Bhagia
Rodney Michael Kinney
...
Jesse Dodge
Kyle Lo
Luca Soldaini
Noah A. Smith
Hanna Hajishirzi
OSLM
716
604
0
01 Feb 2024
Instructional Fingerprinting of Large Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Lyne Tchapmi
Fei Wang
Mingyu Derek Ma
Pang Wei Koh
Chaowei Xiao
Muhao Chen
WaLM
308
71
0
21 Jan 2024
Catwalk: A Unified Language Model Evaluation Framework for Many Datasets
Dirk Groeneveld
Anas Awadalla
Iz Beltagy
Akshita Bhagia
Ian H. Magnusson
Hao Peng
Oyvind Tafjord
Pete Walsh
Kyle Richardson
Jesse Dodge
283
2
0
15 Dec 2023
Explanatory Argument Extraction of Correct Answers in Resident Medical Exams
Iakes Goenaga
Aitziber Atutxa
Koldo Gojenola
Maite Oronoz
Rodrigo Agerri
ELM
250
10
0
01 Dec 2023
Ascle: A Python Natural Language Processing Toolkit for Medical Text Generation
Rui Yang
Qingcheng Zeng
Keen You
Yujie Qiao
Lucas Huang
...
Dragomir R. Radev
Zhiyong Lu
Hua Xu
Qingyu Chen
Irene Li
ELM
LM&MA
271
3
0
28 Nov 2023
AlpaCare:Instruction-tuned Large Language Models for Medical Application
Xinlu Zhang
Chenxin Tian
Xianjun Yang
Lichang Chen
Zekun Li
Linda R. Petzold
LM&MA
557
92
0
23 Oct 2023
Emerging Challenges in Personalized Medicine: Assessing Demographic Effects on Biomedical Question Answering Systems
International Joint Conference on Natural Language Processing (IJCNLP), 2023
Sagi Shaier
Kevin Bennett
Lawrence E Hunter
Katharina von der Wense
219
0
0
16 Oct 2023
Med-HALT: Medical Domain Hallucination Test for Large Language Models
Conference on Computational Natural Language Learning (CoNLL), 2023
Ankit Pal
Logesh Kumar Umapathi
Malaikannan Sankarasubbu
HILM
LM&MA
VLM
372
231
0
28 Jul 2023
A Comprehensive Overview of Large Language Models
ACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Lin Wang
OffRL
1.2K
1,425
0
12 Jul 2023
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
Guilherme Penedo
Quentin Malartic
Daniel Hesslow
Ruxandra-Aimée Cojocaru
Alessandro Cappelli
Hamza Alobeidli
B. Pannier
Ebtesam Almazrouei
Julien Launay
514
918
0
01 Jun 2023
RWKV: Reinventing RNNs for the Transformer Era
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
...
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
737
946
0
22 May 2023
1
2
Next
Page 1 of 2