ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.04701
  4. Cited By
HEAD-QA: A Healthcare Dataset for Complex Reasoning

HEAD-QA: A Healthcare Dataset for Complex Reasoning

Annual Meeting of the Association for Computational Linguistics (ACL), 2019
11 June 2019
David Vilares
Carlos Gómez-Rodríguez
ArXiv (abs)PDFHTML

Papers citing "HEAD-QA: A Healthcare Dataset for Complex Reasoning"

50 / 72 papers shown
OctoMed: Data Recipes for State-of-the-Art Multimodal Medical Reasoning
OctoMed: Data Recipes for State-of-the-Art Multimodal Medical Reasoning
Timothy Ossowski
Sheng Zhang
Qianchu Liu
Guanghui Qin
Reuben Tan
Tristan Naumann
Junjie Hu
Hoifung Poon
LRM
283
2
0
28 Nov 2025
Structured Prompts Improve Evaluation of Language Models
Structured Prompts Improve Evaluation of Language Models
Asad Aali
Muhammad Ahmed Mohsin
Vasiliki Bikia
Arnav Singhvi
Richard Gaus
...
Sanmi Koyejo
Emily Alsentzer
Christopher Potts
N. Shah
Akshay Chaudhari
ELMLRM
326
1
0
25 Nov 2025
HEAD-QA v2: Expanding a Healthcare Benchmark for Reasoning
HEAD-QA v2: Expanding a Healthcare Benchmark for Reasoning
Alexis Correa-Guillén
Carlos Gómez-Rodríguez
David Vilares
CMLELMLRM
323
0
0
19 Nov 2025
IMB: An Italian Medical Benchmark for Question Answering
IMB: An Italian Medical Benchmark for Question Answering
Antonio Romano
Giuseppe Riccio
Mariano Barone
Marco Postiglione
V. Moscato
AI4MH
266
1
0
21 Oct 2025
Rethinking RL Evaluation: Can Benchmarks Truly Reveal Failures of RL Methods?
Rethinking RL Evaluation: Can Benchmarks Truly Reveal Failures of RL Methods?
Z. Chen
Yiming Zhang
Hengguang Zhou
Zenghui Ding
Yining Sun
Cho-Jui Hsieh
OffRLALMELM
132
0
0
12 Oct 2025
From Guidelines to Guarantees: A Graph-Based Evaluation Harness for Domain-Specific Evaluation of LLMs
From Guidelines to Guarantees: A Graph-Based Evaluation Harness for Domain-Specific Evaluation of LLMs
Jessica Lundin
Guillaume Chabot-Couture
Guillaume Chabot-Couture
ELM
136
1
0
28 Aug 2025
Proximal Supervised Fine-Tuning
Proximal Supervised Fine-Tuning
Wenhong Zhu
Ruobing Xie
R. Wang
Xingwu Sun
Di Wang
Pengfei Liu
OffRL
149
4
0
25 Aug 2025
HealthBranches: Synthesizing Clinically-Grounded Question Answering Datasets via Decision Pathways
HealthBranches: Synthesizing Clinically-Grounded Question Answering Datasets via Decision Pathways
Cristian Cosentino
Annamaria Defilippo
Marco Dossena
Christopher Irwin
Sara Joubbi
Pietro Lio'
LM&MAAI4MH
165
1
0
10 Aug 2025
Beyond the Leaderboard: Rethinking Medical Benchmarks for Large Language Models
Beyond the Leaderboard: Rethinking Medical Benchmarks for Large Language Models
Zizhan Ma
Wenxuan Wang
G. Yu
Yiu-Fai Cheung
Meidan Ding
J. Tang
Wenting Chen
LinLin Shen
LM&MAELMAI4MH
269
3
0
06 Aug 2025
MedVLThinker: Simple Baselines for Multimodal Medical Reasoning
MedVLThinker: Simple Baselines for Multimodal Medical Reasoning
Xiaoke Huang
Juncheng Wu
Hui Liu
Xianfeng Tang
Yuyin Zhou
ReLMLRM
327
15
0
04 Aug 2025
FPEdit: Robust LLM Fingerprinting through Localized Parameter Editing
FPEdit: Robust LLM Fingerprinting through Localized Parameter Editing
Shida Wang
Chaohu Liu
Yubo Wang
Linli Xu
KELM
296
3
0
04 Aug 2025
MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation
MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation
Adrien Bazoge
ELM
210
3
0
28 Jul 2025
A Hybrid Early-Exit Algorithm for Large Language Models Based on Space Alignment Decoding (SPADE)
A Hybrid Early-Exit Algorithm for Large Language Models Based on Space Alignment Decoding (SPADE)
Bowen Zheng
Ming Ma
Zhongqiao Lin
Tianming Yang
179
1
0
23 Jul 2025
Train-before-Test Harmonizes Language Model Rankings
Train-before-Test Harmonizes Language Model Rankings
Guanhua Zhang
Ricardo Dominguez-Olmedo
Moritz Hardt
ALM
265
6
0
07 Jul 2025
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
Maggie Huan
Yuetai Li
Tuney Zheng
Xiaoyu Xu
Seungone Kim
Minxin Du
Radha Poovendran
Graham Neubig
Xiang Yue
LRMELM
241
67
0
01 Jul 2025
LASER: Stratified Selective Sampling for Instruction Tuning with Dedicated Scoring Strategy
LASER: Stratified Selective Sampling for Instruction Tuning with Dedicated Scoring Strategy
Paramita Mirza
Lucas Weber
Fabian Küch
415
0
0
28 May 2025
Research Community Perspectives on "Intelligence" and Large Language Models
Research Community Perspectives on "Intelligence" and Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Bertram Højer
Terne Sasha Thorn Jakobsen
Anna Rogers
Stefan Heinrich
228
3
0
27 May 2025
Disentangling Reasoning and Knowledge in Medical Large Language Models
Disentangling Reasoning and Knowledge in Medical Large Language Models
Rahul Thapa
Qingyang Wu
Kevin Wu
Harrison Zhang
Angela Zhang
...
Joseph Boen
Shriya Reddy
Ben Athiwaratkun
Shuaiwen Leon Song
James Zou
ELMAI4MHLM&MALRM
473
12
0
16 May 2025
GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning
GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning
Jian Zhao
Runze Liu
Kaiyan Zhang
Zhimu Zhou
Junqi Gao
...
Jiafei Lyu
Zhouyi Qian
Biqing Qi
Xiu Li
Bowen Zhou
OffRLLRM
524
28
0
01 Apr 2025
Advancing Problem-Based Learning in Biomedical Engineering in the Era of Generative AI
Advancing Problem-Based Learning in Biomedical Engineering in the Era of Generative AI
Micky C. Nnamdi
J. Ben Tamo
Wenqi Shi
Hang Wu
May D. Wang
AI4CE
314
3
0
20 Mar 2025
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning
Xiangru Tang
Daniel Shao
Jiwoong Sohn
Jiapeng Chen
Jiayi Zhang
...
Yilun Zhao
Chenglin Wu
Wenqi Shi
Arman Cohan
Mark B. Gerstein
AI4MHLRMELMLM&MA
365
34
0
10 Mar 2025
BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning
BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning
Haiteng Zhao
Chang Ma
FangZhi Xu
Lingpeng Kong
Zhi-Hong Deng
LRM
583
11
0
23 Feb 2025
WorldMedQA-V: a multilingual, multimodal medical examination dataset for
  multimodal language models evaluation
WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
João Matos
Shan Chen
Siena Placino
Yingya Li
Juan Carlos Climent Pardo
...
Hugo J. W. L. Aerts
Leo Anthony Celi
A. I. Wong
Danielle S. Bitterman
Jack Gallifant
229
9
0
16 Oct 2024
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family ExpertsInternational Conference on Learning Representations (ICLR), 2024
Guorui Zheng
Xidong Wang
Juhao Liang
Nuo Chen
Yuping Zheng
Benyou Wang
MoE
361
11
0
14 Oct 2024
CasiMedicos-Arg: A Medical Question Answering Dataset Annotated with
  Explanatory Argumentative Structures
CasiMedicos-Arg: A Medical Question Answering Dataset Annotated with Explanatory Argumentative StructuresConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Ekaterina Sviridova
Anar Yeginbergen
A. Estarrona
Elena Cabrio
S. Villata
Rodrigo Agerri
309
10
0
07 Oct 2024
Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining
  for Clinical LLMs
Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Clément Christophe
Tathagata Raha
Svetlana Maslenkova
Muhammad Umar Salman
Praveen K Kanithi
Marco AF Pimentel
Shadab Khan
LM&MA
265
4
0
23 Sep 2024
Med42-v2: A Suite of Clinical LLMs
Med42-v2: A Suite of Clinical LLMs
Clément Christophe
Praveen K Kanithi
Tathagata Raha
Shadab Khan
Marco AF Pimentel
ELMLM&MAAI4MH
338
78
0
12 Aug 2024
Data Contamination Report from the 2024 CONDA Shared Task
Data Contamination Report from the 2024 CONDA Shared Task
Oscar Sainz
Iker García-Ferrero
Alon Jacovi
Jonas Hanselle
Yanai Elazar
...
Yu-Min Tseng
Vishaal Udandarao
Zengzhi Wang
Ruijie Xu
Jinglin Yang
320
18
0
31 Jul 2024
CollectiveSFT: Scaling Large Language Models for Chinese Medical
  Benchmark with Collective Instructions in Healthcare
CollectiveSFT: Scaling Large Language Models for Chinese Medical Benchmark with Collective Instructions in Healthcare
Jingwei Zhu
Minghuan Tan
Min Yang
Ruixue Li
Hamid Alinejad-Rokny
ALMLM&MA
277
1
0
29 Jul 2024
Stay Tuned: An Empirical Study of the Impact of Hyperparameters on LLM
  Tuning in Real-World Applications
Stay Tuned: An Empirical Study of the Impact of Hyperparameters on LLM Tuning in Real-World Applications
Alon Halfon
Shai Gretz
Ofir Arviv
Artem Spector
Orith Toledo-Ronen
Yoav Katz
L. Ein-Dor
Michal Shmueli-Scheuer
Noam Slonim
286
8
0
25 Jul 2024
M-QALM: A Benchmark to Assess Clinical Reading Comprehension and
  Knowledge Recall in Large Language Models via Question Answering
M-QALM: A Benchmark to Assess Clinical Reading Comprehension and Knowledge Recall in Large Language Models via Question Answering
Anand Subramanian
Viktor Schlegel
Abhinav Ramesh Kashyap
Thanh-Tung Nguyen
Vijay Prakash Dwivedi
Stefan Winkler
ELMLM&MAAI4MH
196
6
0
06 Jun 2024
Med42 -- Evaluating Fine-Tuning Strategies for Medical LLMs:
  Full-Parameter vs. Parameter-Efficient Approaches
Med42 -- Evaluating Fine-Tuning Strategies for Medical LLMs: Full-Parameter vs. Parameter-Efficient Approaches
Clément Christophe
Praveen K Kanithi
Prateek Munjal
Tathagata Raha
Nasir Hayat
...
Charles Chen
Natalia Vassilieva
Boulbaba Ben Amor
Marco AF Pimentel
Shadab Khan
AI4MHLM&MA
265
70
0
23 Apr 2024
CT-Agent: Clinical Trial Multi-Agent with Large Language Model-based
  Reasoning
CT-Agent: Clinical Trial Multi-Agent with Large Language Model-based Reasoning
Ling Yue
Tianfan Fu
LLMAGLRMELM
181
35
0
23 Apr 2024
SciDaSynth: Interactive Structured Data Extraction from Scientific Literature with Large Language Model
SciDaSynth: Interactive Structured Data Extraction from Scientific Literature with Large Language ModelCampbell Systematic Reviews (Campbell Syst Rev), 2024
Xingbo Wang
S. Huey
Rui Sheng
Saurabh Mehta
Fei Wang
407
4
0
21 Apr 2024
Improving Health Question Answering with Reliable and Time-Aware
  Evidence Retrieval
Improving Health Question Answering with Reliable and Time-Aware Evidence Retrieval
Juraj Vladika
Florian Matthes
RALM
277
14
0
12 Apr 2024
MedExpQA: Multilingual Benchmarking of Large Language Models for Medical
  Question Answering
MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering
Inigo Alonso
Maite Oronoz
Rodrigo Agerri
AI4MHLM&MAELM
268
67
1
08 Apr 2024
Apollo: A Lightweight Multilingual Medical LLM towards Democratizing
  Medical AI to 6B People
Apollo: A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People
Xidong Wang
Nuo Chen
Junying Chen
Yan Hu
Yidong Wang
Xiangbo Wu
Anningzhe Gao
Xiang Wan
Haizhou Li
Benyou Wang
LM&MA
348
46
0
06 Mar 2024
KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean
  Healthcare Professional Licensing Examinations
KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations
Sunjun Kweon
B. Choi
Minkyu Kim
Rae Woong Park
Edward Choi
ELM
220
20
0
03 Mar 2024
Towards Building Multilingual Language Model for Medicine
Towards Building Multilingual Language Model for Medicine
Pengcheng Qiu
Chaoyi Wu
Xiaoman Zhang
Weixiong Lin
Haicheng Wang
Ya Zhang
Yanfeng Wang
Weidi Xie
LM&MAELM
549
172
0
21 Feb 2024
OLMo: Accelerating the Science of Language Models
OLMo: Accelerating the Science of Language Models
Dirk Groeneveld
Iz Beltagy
Pete Walsh
Akshita Bhagia
Rodney Michael Kinney
...
Jesse Dodge
Kyle Lo
Luca Soldaini
Noah A. Smith
Hanna Hajishirzi
OSLM
716
604
0
01 Feb 2024
Instructional Fingerprinting of Large Language Models
Instructional Fingerprinting of Large Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Lyne Tchapmi
Fei Wang
Mingyu Derek Ma
Pang Wei Koh
Chaowei Xiao
Muhao Chen
WaLM
308
71
0
21 Jan 2024
Catwalk: A Unified Language Model Evaluation Framework for Many Datasets
Catwalk: A Unified Language Model Evaluation Framework for Many Datasets
Dirk Groeneveld
Anas Awadalla
Iz Beltagy
Akshita Bhagia
Ian H. Magnusson
Hao Peng
Oyvind Tafjord
Pete Walsh
Kyle Richardson
Jesse Dodge
283
2
0
15 Dec 2023
Explanatory Argument Extraction of Correct Answers in Resident Medical
  Exams
Explanatory Argument Extraction of Correct Answers in Resident Medical Exams
Iakes Goenaga
Aitziber Atutxa
Koldo Gojenola
Maite Oronoz
Rodrigo Agerri
ELM
250
10
0
01 Dec 2023
Ascle: A Python Natural Language Processing Toolkit for Medical Text
  Generation
Ascle: A Python Natural Language Processing Toolkit for Medical Text Generation
Rui Yang
Qingcheng Zeng
Keen You
Yujie Qiao
Lucas Huang
...
Dragomir R. Radev
Zhiyong Lu
Hua Xu
Qingyu Chen
Irene Li
ELMLM&MA
271
3
0
28 Nov 2023
AlpaCare:Instruction-tuned Large Language Models for Medical Application
AlpaCare:Instruction-tuned Large Language Models for Medical Application
Xinlu Zhang
Chenxin Tian
Xianjun Yang
Lichang Chen
Zekun Li
Linda R. Petzold
LM&MA
557
92
0
23 Oct 2023
Emerging Challenges in Personalized Medicine: Assessing Demographic
  Effects on Biomedical Question Answering Systems
Emerging Challenges in Personalized Medicine: Assessing Demographic Effects on Biomedical Question Answering SystemsInternational Joint Conference on Natural Language Processing (IJCNLP), 2023
Sagi Shaier
Kevin Bennett
Lawrence E Hunter
Katharina von der Wense
219
0
0
16 Oct 2023
Med-HALT: Medical Domain Hallucination Test for Large Language Models
Med-HALT: Medical Domain Hallucination Test for Large Language ModelsConference on Computational Natural Language Learning (CoNLL), 2023
Ankit Pal
Logesh Kumar Umapathi
Malaikannan Sankarasubbu
HILMLM&MAVLM
372
231
0
28 Jul 2023
A Comprehensive Overview of Large Language Models
A Comprehensive Overview of Large Language ModelsACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Lin Wang
OffRL
1.2K
1,425
0
12 Jul 2023
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora
  with Web Data, and Web Data Only
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
Guilherme Penedo
Quentin Malartic
Daniel Hesslow
Ruxandra-Aimée Cojocaru
Alessandro Cappelli
Hamza Alobeidli
B. Pannier
Ebtesam Almazrouei
Julien Launay
514
918
0
01 Jun 2023
RWKV: Reinventing RNNs for the Transformer Era
RWKV: Reinventing RNNs for the Transformer EraConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
...
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
737
946
0
22 May 2023
12
Next
Page 1 of 2