Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.17926
Cited By
Large Language Models are not Fair Evaluators
29 May 2023
Peiyi Wang
Lei Li
Liang Chen
Zefan Cai
Dawei Zhu
Binghuai Lin
Yunbo Cao
Qi Liu
Tianyu Liu
Zhifang Sui
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Large Language Models are not Fair Evaluators"
25 / 75 papers shown
Title
Benchmark Data Contamination of Large Language Models: A Survey
Cheng Xu
Shuhao Guan
Derek Greene
Mohand-Tahar Kechadi
ELM
ALM
36
38
0
06 Jun 2024
Discovering Bias in Latent Space: An Unsupervised Debiasing Approach
Dyah Adila
Shuai Zhang
Boran Han
Yuyang Wang
AAML
LLMSV
27
6
0
05 Jun 2024
SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues Evaluation
Kun Zhao
Bohao Yang
Chen Tang
Chenghua Lin
Liang Zhan
33
5
0
24 May 2024
SPO: Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling
Xingzhou Lou
Junge Zhang
Jian Xie
Lifeng Liu
Dong Yan
Kaiqi Huang
29
11
0
21 May 2024
Fennec: Fine-grained Language Model Evaluation and Correction Extended through Branching and Bridging
Xiaobo Liang
Haoke Zhang
Helan hu
Juntao Li
Jun Xu
Min Zhang
ALM
33
2
0
20 May 2024
Latent Concept-based Explanation of NLP Models
Xuemin Yu
Fahim Dalvi
Nadir Durrani
Marzia Nouri
Hassan Sajjad
LRM
FAtt
19
1
0
18 Apr 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Percy Liang
Tatsunori Hashimoto
ALM
53
315
0
06 Apr 2024
Dialectical Alignment: Resolving the Tension of 3H and Security Threats of LLMs
Shu Yang
Jiayuan Su
Han Jiang
Mengdi Li
Keyuan Cheng
Muhammad Asif Ali
Lijie Hu
Di Wang
16
5
0
30 Mar 2024
Towards Training A Chinese Large Language Model for Anesthesiology
Zhonghai Wang
Jie Jiang
Yibing Zhan
Bohao Zhou
Yanhong Li
...
Liang Ding
Hua Jin
Jun Peng
Xu Lin
Weifeng Liu
LM&MA
25
3
0
05 Mar 2024
LLM-SQL-Solver: Can LLMs Determine SQL Equivalence?
Fuheng Zhao
Lawrence Lim
Ishtiyaque Ahmad
D. Agrawal
A. El Abbadi
Amr El Abbadi
54
9
0
16 Dec 2023
LLMEval: A Preliminary Study on How to Evaluate Large Language Models
Yue Zhang
Ming Zhang
Haipeng Yuan
Shichun Liu
Yongyao Shi
Tao Gui
Qi Zhang
Xuanjing Huang
ALM
ELM
11
10
0
12 Dec 2023
Universal Self-Consistency for Large Language Model Generation
Xinyun Chen
Renat Aksitov
Uri Alon
Jie Jessie Ren
Kefan Xiao
Pengcheng Yin
Sushant Prakash
Charles Sutton
Xuezhi Wang
Denny Zhou
LRM
24
65
0
29 Nov 2023
Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations
Wenjie Mo
Jiashu Xu
Qin Liu
Jiong Wang
Jun Yan
Chaowei Xiao
Muhao Chen
Muhao Chen
AAML
46
17
0
16 Nov 2023
Evaluation Metrics in the Era of GPT-4: Reliably Evaluating Large Language Models on Sequence to Sequence Tasks
Andrea Sottana
Bin Liang
Kai Zou
Zheng Yuan
ALM
ELM
LM&MA
25
53
0
20 Oct 2023
Constructive Large Language Models Alignment with Diverse Feedback
Tianshu Yu
Ting-En Lin
Yuchuan Wu
Min Yang
Fei Huang
Yongbin Li
ALM
30
8
0
10 Oct 2023
Generative Judge for Evaluating Alignment
Junlong Li
Shichao Sun
Weizhe Yuan
Run-Ze Fan
Hai Zhao
Pengfei Liu
ELM
ALM
12
76
0
09 Oct 2023
Ragas: Automated Evaluation of Retrieval Augmented Generation
ES Shahul
Jithin James
Luis Espinosa-Anke
Steven Schockaert
80
174
0
26 Sep 2023
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI
Mahyar Abbasian
Elahe Khatibi
Iman Azimi
David Oniani
Zahra Shakeri Hossein Abad
...
Bryant Lin
Olivier Gevaert
Li-Jia Li
Ramesh C. Jain
Amir M. Rahmani
LM&MA
ELM
AI4MH
23
63
0
21 Sep 2023
Bias Testing and Mitigation in LLM-based Code Generation
Dong Huang
Qingwen Bu
Jie M. Zhang
Xiaofei Xie
Junjie Chen
Heming Cui
33
20
0
03 Sep 2023
Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions
Pouya Pezeshkpour
Estevam R. Hruschka
LRM
8
123
0
22 Aug 2023
FairMonitor: A Four-Stage Automatic Framework for Detecting Stereotypes and Biases in Large Language Models
Yanhong Bai
Jiabao Zhao
Jinxin Shi
Tingjiang Wei
Xingjiao Wu
Liangbo He
17
0
0
21 Aug 2023
Empower Large Language Model to Perform Better on Industrial Domain-Specific Question Answering
Fangkai Yang
Pu Zhao
Zezhong Wang
Lu Wang
Jue Zhang
Mohit Garg
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
29
47
0
19 May 2023
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
157
576
0
06 Apr 2023
Learning Robust Representations for Continual Relation Extraction via Adversarial Class Augmentation
Peiyi Wang
Yifan Song
Tianyu Liu
Binghuai Lin
Yunbo Cao
Sujian Li
Zhifang Sui
CLL
SLR
27
22
0
10 Oct 2022
HypoNLI: Exploring the Artificial Patterns of Hypothesis-only Bias in Natural Language Inference
Tianyu Liu
Xin Zheng
Baobao Chang
Zhifang Sui
32
22
0
05 Mar 2020
Previous
1
2