Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.19736
Cited By
Evaluating Large Language Models: A Comprehensive Survey
30 October 2023
Zishan Guo
Renren Jin
Chuang Liu
Yufei Huang
Dan Shi
Supryadi Supryadi
Linhao Yu
Yan Liu
Jiaxuan Li
Bojian Xiong
Deyi Xiong
ELM
LM&MA
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Evaluating Large Language Models: A Comprehensive Survey"
14 / 14 papers shown
Title
Securing RAG: A Risk Assessment and Mitigation Framework
Lukas Ammann
Sara Ott
Christoph R. Landolt
Marco P. Lehmann
SILM
58
0
0
13 May 2025
Semantic Retention and Extreme Compression in LLMs: Can We Have Both?
Stanislas Laborde
Martin Cousseau
Antoun Yaacoub
Lionel Prevost
MQ
52
0
0
12 May 2025
Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets
Wen Liu
Zhongyu Niu
Lang Gao
Zhiying Deng
Jun Wang
Haobo Wang
Ruixuan Li
389
1
0
04 May 2025
Benchmarking LLM-based Relevance Judgment Methods
Negar Arabzadeh
Charles L. A. Clarke
55
0
0
17 Apr 2025
FOReCAst: The Future Outcome Reasoning and Confidence Assessment Benchmark
Zhangdie Yuan
Zifeng Ding
Andreas Vlachos
AI4TS
129
0
0
27 Feb 2025
ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use
Junjie Ye
Zhengyin Du
Xuesong Yao
Weijian Lin
Yufei Xu
...
Siyu Yuan
Tao Gui
Qi Zhang
Xuanjing Huang
Jiecao Chen
79
0
0
05 Jan 2025
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation
Weilong Dong
Xinwei Wu
Renren Jin
Shaoyang Xu
Deyi Xiong
78
8
0
31 Dec 2024
Undesirable Memorization in Large Language Models: A Survey
Ali Satvaty
Suzan Verberne
Fatih Turkmen
ELM
PILM
126
7
0
03 Oct 2024
LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models
Zhenyue Qin
Yu Yin
Dylan Campbell
Xuansheng Wu
Ke Zou
Yih-Chung Tham
Ninghao Liu
Xiuzhen Zhang
Qingyu Chen
61
1
0
02 Oct 2024
LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs
Do Xuan Long
Hai Nguyen Ngoc
Tiviatis Sim
Hieu Dao
Shafiq Joty
Kenji Kawaguchi
Nancy F. Chen
Min-Yen Kan
66
8
0
16 Aug 2024
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
Zhimin Zhao
A. A. Bangash
F. Côgo
Bram Adams
Ahmed E. Hassan
91
1
0
04 Jul 2024
Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective
Yuchen Wen
Keping Bi
Wei Chen
Jiafeng Guo
Xueqi Cheng
96
2
0
20 Jun 2024
Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition
Kehua Feng
Keyan Ding
Hongzhi Tan
Kede Ma
Zhihua Wang
...
Yuzhou Cheng
Ge Sun
Guozhou Zheng
Qiang Zhang
H. Chen
60
12
0
10 Apr 2024
Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models
Lucio La Cava
Andrea Tagarelli
LLMAG
AI4CE
81
13
0
13 Jan 2024
1