Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.07889
Cited By
LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise Comparisons using Large Language Models
15 July 2023
Adian Liusie
Potsawee Manakul
Mark J. F. Gales
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise Comparisons using Large Language Models"
29 / 29 papers shown
Title
HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation
Mingxuan Li
Hanchen Li
Chenhao Tan
ALM
ELM
42
0
0
09 Apr 2025
Can AI Master Construction Management (CM)? Benchmarking State-of-the-Art Large Language Models on CM Certification Exams
Ruoxin Xiong
Yanyu Wang
Suat Gunhan
Yimin Zhu
Charles Berryman
ELM
26
0
0
04 Apr 2025
Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes
Sharan Maiya
Yinhong Liu
Ramit Debnath
Anna Korhonen
30
0
0
22 Mar 2025
Who Relies More on World Knowledge and Bias for Syntactic Ambiguity Resolution: Humans or LLMs?
So Young Lee
Russell Scheinberg
Amber Shore
Ameeta Agrawal
46
1
0
13 Mar 2025
Safer or Luckier? LLMs as Safety Evaluators Are Not Robust to Artifacts
Hongyu Chen
Seraphina Goldfarb-Tarrant
45
0
0
12 Mar 2025
Evaluation of the Automated Labeling Method for Taxonomic Nomenclature Through Prompt-Optimized Large Language Model
Keito Inoshita
Kota Nojiri
Haruto Sugeno
Takumi Taga
39
1
0
08 Mar 2025
Investigating Non-Transitivity in LLM-as-a-Judge
Yi Xu
Laura Ruis
Tim Rocktaschel
Robert Kirk
38
0
0
19 Feb 2025
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Yinhong Liu
Han Zhou
Zhijiang Guo
Ehsan Shareghi
Ivan Vulić
Anna Korhonen
Nigel Collier
ALM
128
67
0
20 Jan 2025
Leveraging Large Language Models for Comparative Literature Summarization with Reflective Incremental Mechanisms
Fernando Gabriela Garcia
Spencer Burns
Harrison Fuller
68
0
0
03 Dec 2024
Bayesian Calibration of Win Rate Estimation with LLM Evaluators
Yicheng Gao
G. Xu
Zhe Wang
Arman Cohan
31
6
0
07 Nov 2024
SkillAggregation: Reference-free LLM-Dependent Aggregation
Guangzhi Sun
Anmol Kagrecha
Potsawee Manakul
Phil Woodland
Mark J. F. Gales
22
0
0
14 Oct 2024
Calibrate to Discriminate: Improve In-Context Learning with Label-Free Comparative Inference
Wei Cheng
Tianlu Wang
Yanmin Ji
Fan Yang
Keren Tan
Yiyu Zheng
13
0
0
03 Oct 2024
Aligning with Logic: Measuring, Evaluating and Improving Logical Preference Consistency in Large Language Models
Yinhong Liu
Zhijiang Guo
Tianya Liang
Ehsan Shareghi
Ivan Vulić
Nigel Collier
70
0
0
03 Oct 2024
Training Language Models to Win Debates with Self-Play Improves Judge Accuracy
Samuel Arnesen
David Rein
Julian Michael
ELM
28
3
0
25 Sep 2024
From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks
Andreas Stephan
D. Zhu
Matthias Aßenmacher
Xiaoyu Shen
Benjamin Roth
ELM
45
4
0
06 Sep 2024
Grammatical Error Feedback: An Implicit Evaluation Approach
Stefano Bannò
Kate Knill
Mark J. F. Gales
18
0
0
18 Aug 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
ELM
ALM
45
55
0
18 Jun 2024
Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments
Han Zhou
Xingchen Wan
Yinhong Liu
Nigel Collier
Ivan Vulić
Anna Korhonen
ALM
29
9
0
17 Jun 2024
Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices
Ruiyang Qin
Dancheng Liu
Zheyu Yan
Zhaoxuan Tan
Zixuan Pan
Zhenge Jia
Meng-Long Jiang
Ahmed Abbasi
Jinjun Xiong
Yiyu Shi
51
10
0
06 Jun 2024
Grade Like a Human: Rethinking Automated Assessment with Large Language Models
Wenjing Xie
Juxin Niu
Chun Jason Xue
Nan Guan
AI4Ed
34
3
0
30 May 2024
Efficient LLM Comparative Assessment: a Product of Experts Framework for Pairwise Comparisons
Adian Liusie
Vatsal Raina
Yassir Fathullah
Mark J. F. Gales
43
8
0
09 May 2024
LLMs for Generating and Evaluating Counterfactuals: A Comprehensive Study
Van Bach Nguyen
Paul Youssef
Jorg Schlotterer
Christin Seifert
37
14
0
26 Apr 2024
PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison
chaeHun Park
Minseok Choi
Dohyun Lee
Jaegul Choo
35
4
0
01 Apr 2024
Prediction-Powered Ranking of Large Language Models
Ivi Chatzi
Eleni Straitouri
Suhas Thejaswi
Manuel Gomez Rodriguez
ALM
24
5
0
27 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
53
28
0
02 Feb 2024
Typhoon: Thai Large Language Models
Kunat Pipatanakul
Phatrasek Jirabovonvisut
Potsawee Manakul
Sittipong Sripaisarnmongkol
Ruangsak Patomwong
Pathomporn Chokchainant
Kasima Tharnpipitchai
35
16
0
21 Dec 2023
Zero-shot Audio Topic Reranking using Large Language Models
Mengjie Qian
Rao Ma
Adian Liusie
Erfan Loweimi
Kate Knill
Mark J. F. Gales
24
1
0
14 Sep 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
206
559
0
03 May 2023
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Potsawee Manakul
Adian Liusie
Mark J. F. Gales
HILM
LRM
150
386
0
15 Mar 2023
1