Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2505.19176
Cited By
v1
v2
v3 (latest)
Assistant-Guided Mitigation of Teacher Preference Bias in LLM-as-a-Judge
25 May 2025
Zhuo Liu
Moxin Li
Xun Deng
Qifan Wang
Fuli Feng
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Assistant-Guided Mitigation of Teacher Preference Bias in LLM-as-a-Judge"
19 / 19 papers shown
Do LLM Evaluators Prefer Themselves for a Reason?
Wei-Lin Chen
Zhepei Wei
Xinyu Zhu
Shi Feng
Yu Meng
ELM
LRM
359
22
0
04 Apr 2025
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
Ruosen Li
Teerth Patel
Xinya Du
LLMAG
ALM
556
126
0
03 Jan 2025
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
International Conference on Learning Representations (ICLR), 2024
Jiayi Ye
Zixiang Xu
Yue Huang
Dongping Chen
Qihui Zhang
...
Werner Geyer
Chao Huang
Pin-Yu Chen
Nitesh Chawla
Xiangliang Zhang
ELM
368
207
0
03 Oct 2024
Aligning Human and LLM Judgments: Insights from EvalAssist on Task-Specific Evaluations and AI-assisted Assessment Strategy Preferences
ACM Symposium on User Interface Software and Technology (UIST), 2024
Zahra Ashktorab
Michael Desmond
Qian Pan
James M. Johnson
Martin Santillan Cooper
Elizabeth M. Daly
Rahul Nair
Tejaswini Pedapati
Swapnaja Achintalwar
Werner Geyer
ELM
287
8
0
01 Oct 2024
Beyond Scalar Reward Model: Learning Generative Judge from Preference Data
Ziyi Ye
Xiangsheng Li
Qiuchi Li
Jiaxin Mao
Yujia Zhou
Wei Shen
Dong Yan
Yiqun Liu
281
35
0
01 Oct 2024
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
Tianhao Wu
Weizhe Yuan
O. Yu. Golovneva
Jing Xu
Yuandong Tian
Jiantao Jiao
Jason Weston
Sainbayar Sukhbaatar
ALM
KELM
LRM
374
156
0
28 Jul 2024
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
A. Bavaresco
Raffaella Bernardi
Leonardo Bertolazzi
Desmond Elliott
Raquel Fernández
...
David Schlangen
Alessandro Suglia
Aditya K Surikuchi
Ece Takmaz
A. Testoni
ALM
ELM
616
186
0
26 Jun 2024
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Seungone Kim
Juyoung Suk
Shayne Longpre
Bill Yuchen Lin
Jamin Shin
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
MoMe
ALM
ELM
389
331
0
02 May 2024
LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery
Samuel R. Bowman
Shi Feng
446
366
0
15 Apr 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Abigail Z. Jacobs
Tatsunori Hashimoto
ALM
465
617
0
06 Apr 2024
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Wei-Lin Chiang
Lianmin Zheng
Ying Sheng
Anastasios Nikolas Angelopoulos
Tianle Li
...
Hao Zhang
Banghua Zhu
Michael I. Jordan
Joseph E. Gonzalez
Ion Stoica
OSLM
431
992
0
07 Mar 2024
Humans or LLMs as the Judge? A Study on Judgement Biases
Guiming Hardy Chen
Shunian Chen
Ziche Liu
Feng Jiang
Benyou Wang
568
214
0
16 Feb 2024
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
International Conference on Learning Representations (ICLR), 2023
Lianghui Zhu
Xinggang Wang
Xinlong Wang
ELM
ALM
473
258
0
26 Oct 2023
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
International Conference on Learning Representations (ICLR), 2023
Seungone Kim
Jamin Shin
Yejin Cho
Joel Jang
Shayne Longpre
...
Sangdoo Yun
Seongjin Shin
Sungdong Kim
James Thorne
Minjoon Seo
ALM
LM&MA
ELM
534
375
0
12 Oct 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Neural Information Processing Systems (NeurIPS), 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
3.2K
6,725
0
09 Jun 2023
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
International Conference on Learning Representations (ICLR), 2023
Yidong Wang
Zhuohao Yu
Zhengran Zeng
Linyi Yang
Cunxiang Wang
...
Yongfeng Zhang
Xingxu Xie
Wei Ye
Shi-Bo Zhang
Yue Zhang
ALM
ELM
479
332
0
08 Jun 2023
Benchmarking Foundation Models with Language-Model-as-an-Examiner
Neural Information Processing Systems (NeurIPS), 2023
Yushi Bai
Jiahao Ying
Yixin Cao
Xin Lv
Yuze He
...
Yijia Xiao
Haozhe Lyu
Jiayin Zhang
Juanzi Li
Lei Hou
ALM
ELM
293
201
0
07 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Neural Information Processing Systems (NeurIPS), 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
953
6,888
0
29 May 2023
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELM
ALM
LM&MA
608
1,873
0
29 Mar 2023
1
Page 1 of 1