Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.11887
Cited By
AutoMedEval: Harnessing Language Models for Automatic Medical Capability Evaluation
17 May 2025
Xiechi Zhang
Zetian Ouyang
Linlin Wang
Gerard de Melo
Zhu Cao
Xiaoling Wang
Ya Zhang
Yanfeng Wang
Liang He
LM&MA
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AutoMedEval: Harnessing Language Models for Automatic Medical Capability Evaluation"
18 / 18 papers shown
Title
MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks
Allen Nie
Yuhui Zhang
Atharva Amdekar
Chris Piech
Tatsunori Hashimoto
Tobias Gerstenberg
53
38
0
30 Oct 2023
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Tri Dao
LRM
97
1,221
0
17 Jul 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
300
4,186
0
09 Jun 2023
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
Yidong Wang
Zhuohao Yu
Zhengran Zeng
Linyi Yang
Cunxiang Wang
...
Jindong Wang
Xingxu Xie
Wei Ye
Shi-Bo Zhang
Yue Zhang
ALM
ELM
95
242
0
08 Jun 2023
Introspective Tips: Large Language Model for In-Context Decision Making
Liting Chen
Lu Wang
Hang Dong
Yali Du
Jie Yan
...
Pu Zhao
Si Qin
Saravan Rajmohan
Qingwei Lin
Dongmei Zhang
LLMAG
LRM
81
25
0
19 May 2023
MedGPTEval: A Dataset and Benchmark to Evaluate Responses of Large Language Models in Medicine
Jie Xu
Lu Lu
Sen Yang
Bilin Liang
Xinwei Peng
...
Lingrui Yang
Huan-Zhi Song
Kang Li
Xin Sun
Shaoting Zhang
LM&MA
AI4MH
34
7
0
12 May 2023
DoctorGLM: Fine-tuning your Chinese Doctor is not a Herculean Task
Honglin Xiong
Sheng Wang
Yitao Zhu
Zihao Zhao
Yuxiao Liu
Linlin Huang
Qian Wang
Dinggang Shen
LM&MA
AI4MH
44
170
0
03 Apr 2023
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELM
ALM
LM&MA
153
1,138
0
29 Mar 2023
ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge
Yunxiang Li
Zihan Li
Kai Zhang
Ruilong Dan
Steven Jiang
You Zhang
LM&MA
AI4MH
151
395
0
24 Mar 2023
Capabilities of GPT-4 on Medical Challenge Problems
Harsha Nori
Nicholas King
S. McKinney
Dean Carignan
Eric Horvitz
LM&MA
ELM
AI4MH
91
793
0
20 Mar 2023
Is ChatGPT a Good NLG Evaluator? A Preliminary Study
Jiaan Wang
Yunlong Liang
Fandong Meng
Zengkui Sun
Haoxiang Shi
Zhixu Li
Jinan Xu
Jianfeng Qu
Jie Zhou
LM&MA
ELM
ALM
AI4MH
101
458
0
07 Mar 2023
Large Language Models Encode Clinical Knowledge
K. Singhal
Shekoofeh Azizi
T. Tu
S. S. Mahdavi
Jason W. Wei
...
A. Rajkomar
Joelle Barral
Christopher Semturs
Alan Karthikesalingam
Vivek Natarajan
LM&MA
ELM
AI4MH
114
2,283
0
26 Dec 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
183
2,131
0
27 May 2022
BARTScore: Evaluating Generated Text as Text Generation
Weizhe Yuan
Graham Neubig
Pengfei Liu
93
829
0
22 Jun 2021
SimCSE: Simple Contrastive Learning of Sentence Embeddings
Tianyu Gao
Xingcheng Yao
Danqi Chen
AILaw
SSL
209
3,336
0
18 Apr 2021
What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams
Di Jin
Eileen Pan
Nassim Oufattole
W. Weng
Hanyi Fang
Peter Szolovits
FaML
ELM
LM&MA
80
749
0
28 Sep 2020
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
245
5,668
0
21 Apr 2019
Toward a Formal Model of Cognitive Synergy
B. Goertzel
26
13
0
13 Mar 2017
1