Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.13375
Cited By
Capabilities of GPT-4 on Medical Challenge Problems
20 March 2023
Harsha Nori
Nicholas King
S. McKinney
Dean Carignan
Eric Horvitz
LM&MA
ELM
AI4MH
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Capabilities of GPT-4 on Medical Challenge Problems"
50 / 370 papers shown
Title
High-Fidelity Pseudo-label Generation by Large Language Models for Training Robust Radiology Report Classifiers
Brian Wong
Kaito Tanaka
20
0
0
03 May 2025
Knowledge-Augmented Language Models Interpreting Structured Chest X-Ray Findings
Alexander Davis
Rafael Souza
Jia-Hao Lim
34
0
0
03 May 2025
Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA
Xuanzhao Dong
Wenhui Zhu
Hao Wang
Xiwen Chen
Peijie Qiu
Rui Yin
Yi Su
Y. Wang
RALM
MedIm
42
0
0
30 Apr 2025
Multimodal Large Language Models for Medicine: A Comprehensive Survey
Jiarui Ye
Hao Tang
LM&MA
76
0
0
29 Apr 2025
m-KAILIN: Knowledge-Driven Agentic Scientific Corpus Distillation Framework for Biomedical Large Language Models Training
Meng Xiao
Xunxin Cai
Chengrui Wang
Yuanchun Zhou
48
0
0
28 Apr 2025
Exploring the Role of Knowledge Graph-Based RAG in Japanese Medical Question Answering with Small-Scale LLMs
Yingjian Chen
Feiyang Li
Xingyu Song
Tianxiao Li
Zixin Xu
Xiujie Chen
Issey Sukeda
Irene Z Li
21
0
0
15 Apr 2025
Benchmarking Biopharmaceuticals Retrieval-Augmented Generation Evaluation
Hanmeng Zhong
Linqing Chen
Weilei Wang
Wentao Wu
19
0
0
15 Apr 2025
Performance of Large Language Models in Supporting Medical Diagnosis and Treatment
Diogo Sousa
Guilherme Barbosa
Catarina Rocha
Dulce Oliveira
LM&MA
ELM
AI4MH
22
0
0
14 Apr 2025
Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study
Aryan Agrawal
Lisa Alazraki
Shahin Honarvar
Marek Rei
49
0
0
03 Apr 2025
Medical Reasoning in LLMs: An In-Depth Analysis of DeepSeek R1
Birger Moëll
Fredrik Sand Aronsson
Sanian Akbar
ELM
LRM
37
1
0
27 Mar 2025
Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark
Z. Li
Yiying Yang
Jiping Lang
Wenhao Jiang
Yuhang Zhao
...
Yuhua Bi
Xiaofei Zeng
Yixian Chen
Junrong Chen
Lin Yao
AI4MH
LM&MA
ELM
41
0
0
22 Mar 2025
RESPONSE: Benchmarking the Ability of Language Models to Undertake Commonsense Reasoning in Crisis Situation
Aissatou Diallo
Antonis Bikakis
Luke Dickens
Anthony Hunter
Rob Miller
ReLM
LRM
45
0
0
14 Mar 2025
It is Too Many Options: Pitfalls of Multiple-Choice Questions in Generative AI and Medical Education
Shrutika Singh
Anton Alyakin
Daniel Alber
Jaden Stryker
Ai Phuong S Tong
...
Mathew de la Paz
Miguel Hernandez-Rovira
Ki Yun Park
Eric Leuthardt
E. Oermann
AI4MH
AI4Ed
ELM
56
0
0
13 Mar 2025
SurgRAW: Multi-Agent Workflow with Chain-of-Thought Reasoning for Surgical Intelligence
Chang Han Low
Ziyue Wang
Tianyi Zhang
Zhitao Zeng
Zhu Zhuo
E. Mazomenos
Yueming Jin
LRM
46
1
0
13 Mar 2025
Are ECGs enough? Deep learning classification of cardiac anomalies using only electrocardiograms
Joao Marques
Arlindo L. Oliveira
37
0
0
11 Mar 2025
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning
Xiangru Tang
Daniel Shao
Jiwoong Sohn
Jiapeng Chen
Jiayi Zhang
...
Yilun Zhao
Chenglin Wu
Wenqi Shi
Arman Cohan
Mark B. Gerstein
AI4MH
LRM
ELM
LM&MA
62
4
0
10 Mar 2025
CUPCase: Clinically Uncommon Patient Cases and Diagnoses Dataset
Oriel Perets
Ofir Ben Shoham
Nir Grinberg
Nadav Rappoport
ELM
34
0
0
08 Mar 2025
Can Frontier LLMs Replace Annotators in Biomedical Text Mining? Analyzing Challenges and Exploring Solutions
Yichong Zhao
Susumu Goto
55
0
0
05 Mar 2025
Bandit-Based Prompt Design Strategy Selection Improves Prompt Optimizers
Rin Ashizawa
Yoichi Hirose
Nozomu Yoshinari
Kento Uchida
Shinichi Shirakawa
59
0
0
03 Mar 2025
Evidence of conceptual mastery in the application of rules by Large Language Models
José Luiz Nunes
G. F. C. F. Almeida
Brian Flanagan
29
0
0
02 Mar 2025
An evaluation of DeepSeek Models in Biomedical Natural Language Processing
Zaifu Zhan
Shuang Zhou
Huixue Zhou
Jiawen Deng
Yu Hou
Jeremy Yeung
Rui Zhang
ELM
44
0
0
01 Mar 2025
FedMentalCare: Towards Privacy-Preserving Fine-Tuned LLMs to Analyze Mental Health Status Using Federated Learning Framework
S M Sarwar
AI4MH
39
0
0
27 Feb 2025
Repurposing the scientific literature with vision-language models
Anton Alyakin
Jaden Stryker
Daniel Alber
Karl L. Sangwon
Brandon Duderstadt
...
Laura Snyder
Eric Leuthardt
Douglas Kondziolka
E. Oermann
Eric Karl Oermann
92
0
0
26 Feb 2025
Elucidating Mechanisms of Demographic Bias in LLMs for Healthcare
Hiba Ahsan
Arnab Sen Sharma
Silvio Amir
David Bau
Byron C. Wallace
75
0
0
20 Feb 2025
CMQCIC-Bench: A Chinese Benchmark for Evaluating Large Language Models in Medical Quality Control Indicator Calculation
Guangya Yu
Yanhao Li
Zongying Jiang
Yuxiong Jin
Li Dai
...
Weiyan Zhang
Yongqi Fan
Qi Ye
Jingping Liu
Tong Ruan
LM&MA
ELM
69
0
0
17 Feb 2025
Do Large Language Models Reason Causally Like Us? Even Better?
Hanna M. Dettki
Brenden M. Lake
Charley M. Wu
Bob Rehder
ReLM
ELM
LRM
90
0
0
14 Feb 2025
Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models
Ran Xu
Hejie Cui
Yue Yu
Xuan Kan
Wenqi Shi
Yuchen Zhuang
Wei Jin
Joyce C. Ho
Carl Yang
58
12
0
28 Jan 2025
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
Kai He
Rui Mao
Qika Lin
Yucheng Ruan
Xiang Lan
Mengling Feng
Erik Cambria
LM&MA
AILaw
79
148
0
28 Jan 2025
ASTRID -- An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems
Mohita Chowdhury
Yajie Vera He
Aisling Higham
Ernest Lim
55
1
0
14 Jan 2025
Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based Medical Evaluation
Shunfan Zheng
Xiechi Zhang
Gerard de Melo
Xiaoling Wang
Linlin Wang
LM&MA
ELM
31
0
0
12 Jan 2025
Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use
Mohit Chandra
Siddharth Sriraman
Gaurav Verma
Harneet Singh Khanuja
Jose Suarez Campayo
Zihang Li
Michael L. Birnbaum
M. D. Choudhury
AI4MH
26
5
0
08 Jan 2025
Towards Omni-RAG: Comprehensive Retrieval-Augmented Generation for Large Language Models in Medical Applications
Zhe Chen
Yusheng Liao
Shuyang Jiang
Pingjie Wang
Y. Guo
Y. Wang
Yu Wang
39
3
0
05 Jan 2025
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
Hanguang Xiao
Feizhong Zhou
X. Liu
Tianqi Liu
Zhipeng Li
Xin Liu
Xiaoxuan Huang
AILaw
LM&MA
LRM
59
17
0
31 Dec 2024
Linguistic Features Extracted by GPT-4 Improve Alzheimer's Disease Detection based on Spontaneous Speech
Jonathan Heitz
Gerold Schneider
Nicolas Langer
LM&MA
86
0
0
20 Dec 2024
ACE-
M
3
M^3
M
3
: Automatic Capability Evaluator for Multimodal Medical Models
Xiechi Zhang
Shunfan Zheng
Linlin Wang
Gerard de Melo
Zhu Cao
Xiaoling Wang
Liang He
ELM
99
0
0
16 Dec 2024
Rephrasing Electronic Health Records for Pretraining Clinical Language Models
Jinghui Liu
Anthony N. Nguyen
SyDa
LM&MA
69
0
0
28 Nov 2024
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining
Ming Hu
Kun Yuan
Yaling Shen
Feilong Tang
Xiaohao Xu
...
Jin Ye
N. Padoy
Nassir Navab
Junjun He
Zongyuan Ge
VLM
CLIP
85
10
0
23 Nov 2024
A dataset of questions on decision-theoretic reasoning in Newcomb-like problems
Caspar Oesterheld
Emery Cooper
Miles Kodama
Linh Chi Nguyen
Ethan Perez
29
1
0
15 Nov 2024
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Khaoula Chehbouni
Jonathan Colaço-Carr
Yash More
Jackie CK Cheung
G. Farnadi
71
0
0
12 Nov 2024
From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond
Harsha Nori
Naoto Usuyama
Nicholas King
S. McKinney
Xavier Fernandes
Sheng Zhang
Eric Horvitz
LRM
LM&MA
ELM
VLM
52
8
0
06 Nov 2024
"It's a conversation, not a quiz": A Risk Taxonomy and Reflection Tool for LLM Adoption in Public Health
Jiawei Zhou
Amy Z. Chen
Darshi Shah
Laura Schwab Reese
Munmun De Choudhury
20
0
0
04 Nov 2024
Improving Scientific Hypothesis Generation with Knowledge Grounded Large Language Models
Guangzhi Xiong
Eric Xie
Amir Hassan Shariatmadari
Sikun Guo
Stefan Bekiranov
Aidong Zhang
LRM
HILM
26
6
0
04 Nov 2024
Beyond Accuracy: Ensuring Correct Predictions With Correct Rationales
Tang Li
Mengmeng Ma
Xi Peng
26
2
0
31 Oct 2024
GPT-4o System Card
OpenAI OpenAI
:
Aaron Hurst
Adam Lerer
Adam P. Goucher
...
Yuchen He
Yuchen Zhang
Yujia Jin
Yunxing Dai
Yury Malkov
MLLM
48
13
0
25 Oct 2024
Demystifying Large Language Models for Medicine: A Primer
Qiao Jin
Nicholas Wan
Robert Leaman
Shubo Tian
Zhizheng Wang
...
Chunhua Weng
Ronald M. Summers
Qingyu Chen
Yifan Peng
Zhiyong Lu
LM&MA
32
3
0
24 Oct 2024
Enhancing Answer Attribution for Faithful Text Generation with Large Language Models
Juraj Vladika
Luca Mülln
Florian Matthes
18
0
0
22 Oct 2024
An Eye for an AI: Evaluating GPT-4o's Visual Perception Skills and Geometric Reasoning Skills Using Computer Graphics Questions
Tony Haoran Feng
Paul Denny
Burkhard C. Wünsche
Andrew Luxton-Reilly
Jacqueline Whalley
18
3
0
22 Oct 2024
Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning
Zongmeng Zhang
Yufeng Shi
Jinhua Zhu
Wengang Zhou
Xiang Qi
Peng Zhang
H. Li
RALM
HILM
16
0
0
22 Oct 2024
How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making?
Kenza Benkirane
Jackie Kay
Maria Perez-Ortiz
23
0
0
21 Oct 2024
Are LLMs Good Zero-Shot Fallacy Classifiers?
Fengjun Pan
Xiaobao Wu
Zongrui Li
Anh Tuan Luu
LRM
38
9
0
19 Oct 2024
1
2
3
4
5
6
7
8
Next