ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.16950
  4. Cited By
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators

Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators

20 January 2025
Yinhong Liu
Han Zhou
Zhijiang Guo
Ehsan Shareghi
Ivan Vulić
Anna Korhonen
Nigel Collier
    ALM
ArXivPDFHTML

Papers citing "Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators"

50 / 53 papers shown
Title
QBD-RankedDataGen: Generating Custom Ranked Datasets for Improving Query-By-Document Search Using LLM-Reranking with Reduced Human Effort
QBD-RankedDataGen: Generating Custom Ranked Datasets for Improving Query-By-Document Search Using LLM-Reranking with Reduced Human Effort
Sriram Gopalakrishnan
Sunandita Patra
21
0
0
07 May 2025
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
Bang Zhang
Ruotian Ma
Qingxuan Jiang
Peisong Wang
Jiaqi Chen
...
Fanghua Ye
Jian Li
Yifan Yang
Zhaopeng Tu
Xiaolong Li
LLMAG
ELM
ALM
95
25
1
01 May 2025
Explanatory Summarization with Discourse-Driven Planning
Explanatory Summarization with Discourse-Driven Planning
Dongqi Liu
Xi Yu
Vera Demberg
Mirella Lapata
45
0
0
27 Apr 2025
Optimizing Compound Retrieval Systems
Optimizing Compound Retrieval Systems
Harrie Oosterhuis
R. Jagerman
Zhen Qin
Xuanhui Wang
33
0
0
16 Apr 2025
HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation
HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation
Mingxuan Li
Hanchen Li
Chenhao Tan
ALM
ELM
42
0
0
09 Apr 2025
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Hamed Mahdavi
Alireza Hashemi
Majid Daliri
Pegah Mohammadipour
Alireza Farhadi
Samira Malek
Yekta Yazdanifard
Amir Khasahmadi
V. Honavar
ELM
LRM
43
1
0
01 Apr 2025
Learning to Reason for Long-Form Story Generation
Learning to Reason for Long-Form Story Generation
Alexander Gurung
Mirella Lapata
ReLM
OffRL
LRM
53
0
0
28 Mar 2025
Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes
Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes
Sharan Maiya
Yinhong Liu
Ramit Debnath
Anna Korhonen
30
0
0
22 Mar 2025
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
Austin Xu
Srijan Bansal
Yifei Ming
Semih Yavuz
Shafiq R. Joty
ELM
89
2
0
19 Mar 2025
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation
Qihui Zhang
Munan Ning
Zheyuan Liu
Yanbo Wang
Jiayi Ye
Yue Huang
Shuo Yang
Xiao Chen
Y. Song
Li Yuan
LRM
56
0
0
19 Mar 2025
No Free Labels: Limitations of LLM-as-a-Judge Without Human Grounding
Michael Krumdick
Charles Lovering
Varshini Reddy
Seth Ebner
Chris Tanner
ALM
ELM
51
2
0
07 Mar 2025
Process-based Self-Rewarding Language Models
Shimao Zhang
Xiao Liu
Xin Zhang
Junxiao Liu
Zheheng Luo
Shujian Huang
Yeyun Gong
ReLM
SyDa
LRM
93
2
0
05 Mar 2025
Improving LLM-as-a-Judge Inference with the Judgment Distribution
Victor Wang
Michael J.Q. Zhang
Eunsol Choi
53
0
0
04 Mar 2025
Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing
Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing
Juntai Cao
Xiang Zhang
Raymond Li
Chuyuan Li
Shafiq R. Joty
Giuseppe Carenini
54
1
0
27 Feb 2025
When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning
When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning
Yijiang River Dong
Tiancheng Hu
Yinhong Liu
Ahmet Üstün
Nigel Collier
75
1
0
26 Feb 2025
Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models
Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models
Shuliang Liu
Xinze Li
Zhenghao Liu
Yukun Yan
Cheng Yang
Zheni Zeng
Zhiyuan Liu
Maosong Sun
Ge Yu
RALM
88
1
0
26 Feb 2025
Enhancing Human Evaluation in Machine Translation with Comparative Judgment
Enhancing Human Evaluation in Machine Translation with Comparative Judgment
Yixiao Song
Parker Riley
Daniel Deutsch
Markus Freitag
55
1
0
25 Feb 2025
Investigating Non-Transitivity in LLM-as-a-Judge
Investigating Non-Transitivity in LLM-as-a-Judge
Yi Xu
Laura Ruis
Tim Rocktaschel
Robert Kirk
38
0
0
19 Feb 2025
An Empirical Analysis of Uncertainty in Large Language Model Evaluations
An Empirical Analysis of Uncertainty in Large Language Model Evaluations
Qiujie Xie
Qingqiu Li
Zhuohao Yu
Yuejie Zhang
Yue Zhang
Linyi Yang
ELM
58
1
0
15 Feb 2025
AI Alignment at Your Discretion
AI Alignment at Your Discretion
Maarten Buyl
Hadi Khalaf
C. M. Verdun
Lucas Monteiro Paes
Caio Vieira Machado
Flavio du Pin Calmon
33
0
0
10 Feb 2025
Self-Supervised Prompt Optimization
Self-Supervised Prompt Optimization
Jinyu Xiang
Jiayi Zhang
Zhaoyang Yu
Fengwei Teng
Jinhao Tu
Xinbing Liang
Sirui Hong
Chenglin Wu
Yuyu Luo
OffRL
LRM
57
5
0
07 Feb 2025
CalibraEval: Calibrating Prediction Distribution to Mitigate Selection
  Bias in LLMs-as-Judges
CalibraEval: Calibrating Prediction Distribution to Mitigate Selection Bias in LLMs-as-Judges
Haitao Li
Junjie Chen
Qingyao Ai
Zhumin Chu
Yujia Zhou
Qian Dong
Yiqun Liu
30
8
0
20 Oct 2024
Anchored Alignment for Self-Explanations Enhancement
Anchored Alignment for Self-Explanations Enhancement
Luis Felipe Villa-Arenas
Ata Nizamoglu
Qianli Wang
Sebastian Möller
Vera Schmitt
14
0
0
17 Oct 2024
TradExpert: Revolutionizing Trading with Mixture of Expert LLMs
TradExpert: Revolutionizing Trading with Mixture of Expert LLMs
Qianggang Ding
Haochen Shi
Jiadong Guo
Bang Liu
AIFin
31
3
0
16 Oct 2024
Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up
Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up
Jiahao Yuan
Dehui Du
Hao Zhang
Zixiang Di
Usman Naseem
LRM
24
1
0
16 Oct 2024
DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing
DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing
Shreya Shankar
Tristan Chambers
Eugene Wu
Aditya G. Parameswaran
Eugene Wu
LLMAG
47
5
0
16 Oct 2024
Understanding Likelihood Over-optimisation in Direct Alignment
  Algorithms
Understanding Likelihood Over-optimisation in Direct Alignment Algorithms
Zhengyan Shi
Sander Land
Acyr F. Locatelli
Matthieu Geist
Max Bartolo
46
3
0
15 Oct 2024
JurEE not Judges: safeguarding llm interactions with small, specialised
  Encoder Ensembles
JurEE not Judges: safeguarding llm interactions with small, specialised Encoder Ensembles
Dom Nasrabadi
24
0
0
11 Oct 2024
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min-Bin Lin
33
8
0
09 Oct 2024
Rationale-Aware Answer Verification by Pairwise Self-Evaluation
Rationale-Aware Answer Verification by Pairwise Self-Evaluation
Akira Kawabata
Saku Sugawara
LRM
28
2
0
07 Oct 2024
Aligning with Logic: Measuring, Evaluating and Improving Logical Preference Consistency in Large Language Models
Aligning with Logic: Measuring, Evaluating and Improving Logical Preference Consistency in Large Language Models
Yinhong Liu
Zhijiang Guo
Tianya Liang
Ehsan Shareghi
Ivan Vulić
Nigel Collier
67
0
0
03 Oct 2024
Agents' Room: Narrative Generation through Multi-step Collaboration
Agents' Room: Narrative Generation through Multi-step Collaboration
Fantine Huot
Reinald Kim Amplayo
Jennimaria Palomaki
Alice Shoshana Jakobovits
Elizabeth Clark
Mirella Lapata
43
7
0
03 Oct 2024
Finetuning LLMs for Comparative Assessment Tasks
Finetuning LLMs for Comparative Assessment Tasks
Vatsal Raina
Adian Liusie
Mark J. F. Gales
24
0
0
24 Sep 2024
Generating Visual Stories with Grounded and Coreferent Characters
Generating Visual Stories with Grounded and Coreferent Characters
Danyang Liu
Mirella Lapata
Frank Keller
13
2
0
20 Sep 2024
LLM-as-a-Judge & Reward Model: What They Can and Cannot Do
LLM-as-a-Judge & Reward Model: What They Can and Cannot Do
Guijin Son
Hyunwoo Ko
Hoyoung Lee
Yewon Kim
Seunghyeok Hong
ALM
ELM
30
5
0
17 Sep 2024
LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs
LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs
Yuhao Wu
Ming Shan Hee
Zhiqing Hu
Roy Ka-Wei Lee
RALM
20
0
0
03 Sep 2024
Reference-Guided Verdict: LLMs-as-Judges in Automatic Evaluation of
  Free-Form Text
Reference-Guided Verdict: LLMs-as-Judges in Automatic Evaluation of Free-Form Text
Sher Badshah
Hassan Sajjad
ELM
36
9
0
17 Aug 2024
Evaluating the Evaluator: Measuring LLMs' Adherence to Task Evaluation
  Instructions
Evaluating the Evaluator: Measuring LLMs' Adherence to Task Evaluation Instructions
Bhuvanashree Murugadoss
Christian Poelitz
Ian Drosos
Vu Le
Nick McKenna
Carina Negreanu
Chris Parnin
Advait Sarkar
ELM
ALM
22
3
0
16 Aug 2024
Attention Instruction: Amplifying Attention in the Middle via Prompting
Attention Instruction: Amplifying Attention in the Middle via Prompting
Meiru Zhang
Zaiqiao Meng
Nigel Collier
36
4
0
24 Jun 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
ELM
ALM
45
55
0
18 Jun 2024
Can LLM be a Personalized Judge?
Can LLM be a Personalized Judge?
Yijiang River Dong
Tiancheng Hu
Nigel Collier
ELM
27
15
0
17 Jun 2024
Fairer Preferences Elicit Improved Human-Aligned Large Language Model
  Judgments
Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments
Han Zhou
Xingchen Wan
Yinhong Liu
Nigel Collier
Ivan Vulić
Anna Korhonen
ALM
21
9
0
17 Jun 2024
Grade Like a Human: Rethinking Automated Assessment with Large Language
  Models
Grade Like a Human: Rethinking Automated Assessment with Large Language Models
Wenjing Xie
Juxin Niu
Chun Jason Xue
Nan Guan
AI4Ed
28
0
0
30 May 2024
Efficient LLM Comparative Assessment: a Product of Experts Framework for
  Pairwise Comparisons
Efficient LLM Comparative Assessment: a Product of Experts Framework for Pairwise Comparisons
Adian Liusie
Vatsal Raina
Yassir Fathullah
Mark J. F. Gales
37
8
0
09 May 2024
Unifying Bias and Unfairness in Information Retrieval: A Survey of
  Challenges and Opportunities with Large Language Models
Unifying Bias and Unfairness in Information Retrieval: A Survey of Challenges and Opportunities with Large Language Models
Sunhao Dai
Chen Xu
Shicheng Xu
Liang Pang
Zhenhua Dong
Jun Xu
39
60
0
17 Apr 2024
Prediction-Powered Ranking of Large Language Models
Prediction-Powered Ranking of Large Language Models
Ivi Chatzi
Eleni Straitouri
Suhas Thejaswi
Manuel Gomez Rodriguez
ALM
24
5
0
27 Feb 2024
Unlocking Structure Measuring: Introducing PDD, an Automatic Metric for
  Positional Discourse Coherence
Unlocking Structure Measuring: Introducing PDD, an Automatic Metric for Positional Discourse Coherence
Yinhong Liu
Yixuan Su
Ehsan Shareghi
Nigel Collier
25
4
0
15 Feb 2024
TOAD: Task-Oriented Automatic Dialogs with Diverse Response Styles
TOAD: Task-Oriented Automatic Dialogs with Diverse Response Styles
Yinhong Liu
Yimai Fang
David Vandyke
Nigel Collier
19
3
0
15 Feb 2024
PROXYQA: An Alternative Framework for Evaluating Long-Form Text
  Generation with Large Language Models
PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models
Haochen Tan
Zhijiang Guo
Zhan Shi
Lu Xu
Zhili Liu
...
Xiaoguang Li
Yasheng Wang
Lifeng Shang
Qun Liu
Linqi Song
19
12
0
26 Jan 2024
Generative Calibration for In-context Learning
Generative Calibration for In-context Learning
Zhongtao Jiang
Yuanzhe Zhang
Cao Liu
Jun Zhao
Kang Liu
149
8
0
16 Oct 2023
12
Next