ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.07960
  4. Cited By
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments
v1v2v3v4v5 (latest)

AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments

13 May 2024
Samuel Schmidgall
Rojin Ziaei
Carl Harris
Eduardo Reis
Jeffrey Jopling
Michael Moor
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments"

50 / 104 papers shown
Title
Medchain: Bridging the Gap Between LLM Agents and Clinical Practice with Interactive Sequence
Medchain: Bridging the Gap Between LLM Agents and Clinical Practice with Interactive Sequence
Jie Liu
Wenxuan Wang
Zizhan Ma
Guolin Huang
Yihang Su
Kao-Jung Chang
Wenting Chen
Haoliang Li
Linlin Shen
Michael R. Lyu
331
12
0
02 Dec 2024
PIORS: Personalized Intelligent Outpatient Reception based on Large
  Language Model with Multi-Agents Medical Scenario Simulation
PIORS: Personalized Intelligent Outpatient Reception based on Large Language Model with Multi-Agents Medical Scenario Simulation
Zhijie Bao
Qiang Liu
Ying Guo
Zhengqiang Ye
Jun Shen
Shirong Xie
Jiajie Peng
Xuanjing Huang
Zhongyu Wei
320
4
0
21 Nov 2024
VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge
VILA-M3: Enhancing Vision-Language Models with Medical Expert KnowledgeComputer Vision and Pattern Recognition (CVPR), 2024
Vishwesh Nath
Wenqi Li
Dong Yang
Andriy Myronenko
Mingxin Zheng
...
Holger Roth
Daguang Xu
Baris Turkbey
Holger Roth
Daguang Xu
VLM
519
24
0
19 Nov 2024
Enhancing Investment Analysis: Optimizing AI-Agent Collaboration in
  Financial Research
Enhancing Investment Analysis: Optimizing AI-Agent Collaboration in Financial ResearchInternational Conference on AI in Finance (ICAF), 2024
Xuewen Han
Neng Wang
Shangkun Che
Hongyang Yang
Kunpeng Zhang
S. Xu
AIFin
143
25
0
07 Nov 2024
Social Science Meets LLMs: How Reliable Are Large Language Models in
  Social Simulations?
Social Science Meets LLMs: How Reliable Are Large Language Models in Social Simulations?
Yue Huang
Zhengqing Yuan
Yujun Zhou
Kehan Guo
Xiangqi Wang
...
Weixiang Sun
Lichao Sun
Jindong Wang
Yanfang Ye
Wei Wei
LLMAG
163
23
0
30 Oct 2024
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison FeedbackNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Zonghai Yao
Aditya Parashar
Huixue Zhou
Won Seok Jang
Feiyun Ouyang
Zhichao Yang
Hong-ye Yu
ELM
406
14
0
17 Oct 2024
Adaptive Reasoning and Acting in Medical Language Agents
Adaptive Reasoning and Acting in Medical Language Agents
Abhishek Dutta
Yen-Che Hsiao
AI4CELM&MA
105
8
0
13 Oct 2024
MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders
MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders
Cheng-rong Li
May Fung
Qingyun Wang
Chi Han
Pengfei Yu
Jindong Wang
Heng Ji
AI4MH
867
1
0
09 Oct 2024
Simulated patient systems powered by large language model-based AI agents offer potential for transforming medical education
Simulated patient systems powered by large language model-based AI agents offer potential for transforming medical educationCommunications Medicine (Commun Med), 2024
Huizi Yu
Jiayan Zhou
Jinkui Chi
Shan Chen
Jack Gallifant
...
Xin Ma
Themistocles L. Assimes
Lizhou Fan
Lin Lu
Lizhou Fan
965
8
0
27 Sep 2024
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?
Yunfei Xie
Juncheng Wu
Haoqin Tu
Siwei Yang
Bingchen Zhao
Yongshuo Zong
Qiao Jin
Cihang Xie
Yuyin Zhou
LM&MAELMLRM
312
39
0
23 Sep 2024
From Text to Multimodality: Exploring the Evolution and Impact of Large
  Language Models in Medical Practice
From Text to Multimodality: Exploring the Evolution and Impact of Large Language Models in Medical Practice
Qian Niu
Keyu Chen
Ming Li
Pohsun Feng
Ziqian Bi
...
Junyu Liu
Benji Peng
Tianyang Wang
Yunze Wang
Silin Chen
LM&MA
510
11
0
14 Sep 2024
Interactive Agents: Simulating Counselor-Client Psychological Counseling
  via Role-Playing LLM-to-LLM Interactions
Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions
Huachuan Qiu
Zhenzhong Lan
254
36
0
28 Aug 2024
MSDiagnosis: An EMR-based Dataset for Clinical Multi-Step Diagnosis
MSDiagnosis: An EMR-based Dataset for Clinical Multi-Step Diagnosis
Ruihui Hou
Shencheng Chen
Yongqi Fan
Lifeng Zhu
Jing Sun
Jingping Liu
Tong Ruan
264
0
0
19 Aug 2024
Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive
  Ask-First-Observe-Next Paradigm
Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive Ask-First-Observe-Next Paradigm
Hongcheng Liu
Yusheng Liao
Siqv Ou
Yuhao Wang
Heyang Liu
Yanfeng Wang
Yu Wang
LM&MA
159
3
0
16 Aug 2024
GP-VLS: A general-purpose vision language model for surgery
GP-VLS: A general-purpose vision language model for surgery
Samuel Schmidgall
Joseph Cho
C. Zakka
W. Hiesinger
LM&MA
327
17
0
27 Jul 2024
Cactus: Towards Psychological Counseling Conversations using Cognitive
  Behavioral Theory
Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory
Suyeon Lee
Sunghwan Kim
Minju Kim
Dongjin Kang
Dongil Yang
...
Seungbeen Lee
Kyoung-Mee Chung
Youngjae Yu
Dongha Lee
Jinyoung Yeo
202
32
0
03 Jul 2024
MMedAgent: Learning to Use Medical Tools with Multi-modal Agent
MMedAgent: Learning to Use Medical Tools with Multi-modal Agent
Binxu Li
Tiankai Yan
Yuanting Pan
Zhe Xu
Jie Luo
Ruiyang Ji
Shilong Liu
Haoyu Dong
Zihao Lin
Yixin Wang
LM&MA
213
72
0
02 Jul 2024
Ask-before-Plan: Proactive Language Agents for Real-World Planning
Ask-before-Plan: Proactive Language Agents for Real-World PlanningConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Xuan Zhang
Yang Deng
Zifeng Ren
See-Kiong Ng
Tat-Seng Chua
LLMAGLM&Ro
264
34
0
18 Jun 2024
Reverse Image Retrieval Cues Parametric Memory in Multimodal LLMs
Reverse Image Retrieval Cues Parametric Memory in Multimodal LLMs
Jialiang Xu
Michael Moor
J. Leskovec
170
7
0
29 May 2024
LLM Evaluators Recognize and Favor Their Own Generations
LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery
Samuel R. Bowman
Shi Feng
377
341
0
15 Apr 2024
m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
Zixian Ma
Weikai Huang
Jieyu Zhang
Tanmay Gupta
Ranjay Krishna
325
34
0
17 Mar 2024
Automatic Interactive Evaluation for Large Language Models with State
  Aware Patient Simulator
Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator
Yusheng Liao
Yutong Meng
Yuhao Wang
Hongcheng Liu
Yanfeng Wang
Yu Wang
LM&MAELM
256
23
0
13 Mar 2024
Few shot chain-of-thought driven reasoning to prompt LLMs for open ended
  medical question answering
Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answeringConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Ojas Gramopadhye
Saeel Sandeep Nachane
Prateek Chanda
Ganesh Ramakrishnan
Kshitij S. Jadhav
Yatin Nandwani
Lucian Popa
Sachindra Joshi
LM&MAELMLRM
233
57
0
07 Mar 2024
Benchmarking Retrieval-Augmented Generation for Medicine
Benchmarking Retrieval-Augmented Generation for Medicine
Guangzhi Xiong
Qiao Jin
Zhiyong Lu
Aidong Zhang
RALM
391
359
0
20 Feb 2024
Addressing cognitive bias in medical language models
Addressing cognitive bias in medical language models
Samuel Schmidgall
Carl Harris
Ime Essien
Daniel Olshvang
Tawsifur Rahman
Ji Woong Kim
Rojin Ziaei
Nhan Duy Truong
Peter M Abadir
Rama Chellappa
ELM
287
38
0
12 Feb 2024
Towards Conversational Diagnostic AI
Towards Conversational Diagnostic AI
Tao Tu
Anil Palepu
M. Schaekermann
Khaled Saab
Jan Freyberg
...
Katherine Chou
Greg S. Corrado
Yossi Matias
Alan Karthikesalingam
Vivek Natarajan
AI4MHLM&MA
245
136
0
11 Jan 2024
Mixtral of Experts
Mixtral of Experts
Albert Q. Jiang
Alexandre Sablayrolles
Antoine Roux
A. Mensch
Blanche Savary
...
Théophile Gervet
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoELLMAG
519
1,551
0
08 Jan 2024
Retrieval-Augmented Generation for Large Language Models: A Survey
Retrieval-Augmented Generation for Large Language Models: A Survey
Yunfan Gao
Yun Xiong
Xinyu Gao
Kangxiang Jia
Jinliu Pan
Yuxi Bi
Yi Dai
Jiawei Sun
Meng Wang
Haofen Wang
3DVRALM
1.2K
2,662
1
18 Dec 2023
Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case
  Study in Medicine
Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine
Harsha Nori
Yin Tat Lee
Sheng Zhang
Dean Carignan
Richard Edgar
...
Hoifung Poon
Tao Qin
Naoto Usuyama
Chris White
Eric Horvitz
LM&MAAI4MHMedImELM
241
436
0
28 Nov 2023
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Zeming Chen
Alejandro Hernández Cano
Angelika Romanou
Antoine Bonnet
Kyle Matoba
...
Axel Marmet
Syrielle Montariol
Mary-Anne Hartley
Martin Jaggi
Antoine Bosselut
LM&MAAI4MHMedIm
298
301
0
27 Nov 2023
MedAgents: Large Language Models as Collaborators for Zero-shot Medical
  Reasoning
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
Xiangru Tang
Anni Zou
Zhuosheng Zhang
Ziming Li
Yilun Zhao
Xingyao Zhang
Arman Cohan
Mark B. Gerstein
LRMLM&MA
414
300
0
16 Nov 2023
Self-RAG: Learning to Retrieve, Generate, and Critique through
  Self-Reflection
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-ReflectionInternational Conference on Learning Representations (ICLR), 2023
Akari Asai
Zeqiu Wu
Yizhong Wang
Avirup Sil
Hannaneh Hajishirzi
RALM
585
1,289
0
17 Oct 2023
Language models are susceptible to incorrect patient self-diagnosis in
  medical applications
Language models are susceptible to incorrect patient self-diagnosis in medical applications
Rojin Ziaei
Samuel Schmidgall
ELMLM&MA
206
13
0
17 Sep 2023
A Survey on Large Language Model based Autonomous Agents
A Survey on Large Language Model based Autonomous Agents
Lei Wang
Chengbang Ma
Xueyang Feng
Zeyu Zhang
Hao-ran Yang
...
Xu Chen
Yankai Lin
Wayne Xin Zhao
Zhewei Wei
Ji-Rong Wen
LLMAGAI4CELM&Ro
654
2,068
0
22 Aug 2023
ExpeL: LLM Agents Are Experiential Learners
ExpeL: LLM Agents Are Experiential LearnersAAAI Conference on Artificial Intelligence (AAAI), 2023
Andrew Zhao
Daniel Huang
Quentin Xu
Matthieu Lin
Wenshu Fan
Gao Huang
LLMAG
447
336
0
20 Aug 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot ArenaNeural Information Processing Systems (NeurIPS), 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALMOSLMELM
3.1K
6,484
0
09 Jun 2023
Improving Factuality and Reasoning in Language Models through Multiagent
  Debate
Improving Factuality and Reasoning in Language Models through Multiagent DebateInternational Conference on Machine Learning (ICML), 2023
Yilun Du
Shuang Li
Antonio Torralba
J. Tenenbaum
Igor Mordatch
LLMAGLRM
334
1,156
0
23 May 2023
Active Retrieval Augmented Generation
Active Retrieval Augmented GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Zhengbao Jiang
Frank F. Xu
Luyu Gao
Zhiqing Sun
Qian Liu
Jane Dwivedi-Yu
Yiming Yang
Jamie Callan
Graham Neubig
RALM
357
466
0
11 May 2023
PMC-LLaMA: Towards Building Open-source Language Models for Medicine
PMC-LLaMA: Towards Building Open-source Language Models for Medicine
Chaoyi Wu
Weixiong Lin
Xiaoman Zhang
Ya Zhang
Yanfeng Wang
Weidi Xie
LM&MAAI4MH
287
98
0
27 Apr 2023
Generative Agents: Interactive Simulacra of Human Behavior
Generative Agents: Interactive Simulacra of Human BehaviorACM Symposium on User Interface Software and Technology (UIST), 2023
Cristina Mata
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Abigail Z. Jacobs
Michael S. Bernstein
LM&RoAI4CE
843
2,951
0
07 Apr 2023
Almanac: Retrieval-Augmented Language Models for Clinical Medicine
Almanac: Retrieval-Augmented Language Models for Clinical MedicineResearch Square (RS), 2023
C. Zakka
Akash Chaurasia
R. Shad
Alex R. Dalal
Jennifer L. Kim
...
Kathleen Boyd
Karen Hirsch
C. Langlotz
Joanna Nelson
W. Hiesinger
LM&MA
398
215
0
01 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALMPILM
4.1K
17,584
0
27 Feb 2023
Large Language Models Encode Clinical Knowledge
Large Language Models Encode Clinical KnowledgeNature (Nature), 2022
K. Singhal
Shekoofeh Azizi
T. Tu
S. S. Mahdavi
Jason W. Wei
...
A. Rajkomar
Joelle Barral
Christopher Semturs
Alan Karthikesalingam
Vivek Natarajan
LM&MAELMAI4MH
602
3,347
0
26 Dec 2022
Can large language models reason about medical questions?
Can large language models reason about medical questions?Patterns (Patterns), 2022
Valentin Liévin
C. Hother
Andreas Geert Motzfeldt
Ole Winther
ELMLM&MAAI4MHLRM
494
387
0
17 Jul 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot ReasonersNeural Information Processing Systems (NeurIPS), 2022
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLMLRM
1.3K
6,003
0
24 May 2022
MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical
  domain Question Answering
MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question AnsweringACM Conference on Health, Inference, and Learning (ACM CHIL), 2022
Ankit Pal
Logesh Kumar Umapathi
Malaikannan Sankarasubbu
ELMLM&MA
449
514
0
27 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsNeural Information Processing Systems (NeurIPS), 2022
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
2.3K
14,365
0
28 Jan 2022
What Disease does this Patient Have? A Large-scale Open Domain Question
  Answering Dataset from Medical Exams
What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical ExamsApplied Sciences (Appl. Sci.), 2020
Di Jin
Eileen Pan
Nassim Oufattole
W. Weng
Hanyi Fang
Peter Szolovits
FaMLELMLM&MA
420
1,232
0
28 Sep 2020
Measuring Massive Multitask Language Understanding
Measuring Massive Multitask Language UnderstandingInternational Conference on Learning Representations (ICLR), 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELMRALM
2.0K
6,463
0
07 Sep 2020
Domain-Specific Language Model Pretraining for Biomedical Natural
  Language Processing
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing
Yu Gu
Robert Tinn
Hao Cheng
Michael R. Lucas
Naoto Usuyama
Xiaodong Liu
Tristan Naumann
Jianfeng Gao
Hoifung Poon
LM&MAAI4CE
636
2,133
0
31 Jul 2020
Previous
123
Next