ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.07889
  4. Cited By
LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise
  Comparisons using Large Language Models

LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise Comparisons using Large Language Models

15 July 2023
Adian Liusie
Potsawee Manakul
Mark J. F. Gales
    ELM
ArXivPDFHTML

Papers citing "LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise Comparisons using Large Language Models"

29 / 29 papers shown
Title
HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation
HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation
Mingxuan Li
Hanchen Li
Chenhao Tan
ALM
ELM
42
0
0
09 Apr 2025
Can AI Master Construction Management (CM)? Benchmarking State-of-the-Art Large Language Models on CM Certification Exams
Can AI Master Construction Management (CM)? Benchmarking State-of-the-Art Large Language Models on CM Certification Exams
Ruoxin Xiong
Yanyu Wang
Suat Gunhan
Yimin Zhu
Charles Berryman
ELM
26
0
0
04 Apr 2025
Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes
Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes
Sharan Maiya
Yinhong Liu
Ramit Debnath
Anna Korhonen
30
0
0
22 Mar 2025
Who Relies More on World Knowledge and Bias for Syntactic Ambiguity Resolution: Humans or LLMs?
Who Relies More on World Knowledge and Bias for Syntactic Ambiguity Resolution: Humans or LLMs?
So Young Lee
Russell Scheinberg
Amber Shore
Ameeta Agrawal
46
1
0
13 Mar 2025
Safer or Luckier? LLMs as Safety Evaluators Are Not Robust to Artifacts
Hongyu Chen
Seraphina Goldfarb-Tarrant
45
0
0
12 Mar 2025
Evaluation of the Automated Labeling Method for Taxonomic Nomenclature Through Prompt-Optimized Large Language Model
Keito Inoshita
Kota Nojiri
Haruto Sugeno
Takumi Taga
39
1
0
08 Mar 2025
Investigating Non-Transitivity in LLM-as-a-Judge
Investigating Non-Transitivity in LLM-as-a-Judge
Yi Xu
Laura Ruis
Tim Rocktaschel
Robert Kirk
38
0
0
19 Feb 2025
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Yinhong Liu
Han Zhou
Zhijiang Guo
Ehsan Shareghi
Ivan Vulić
Anna Korhonen
Nigel Collier
ALM
128
67
0
20 Jan 2025
Leveraging Large Language Models for Comparative Literature
  Summarization with Reflective Incremental Mechanisms
Leveraging Large Language Models for Comparative Literature Summarization with Reflective Incremental Mechanisms
Fernando Gabriela Garcia
Spencer Burns
Harrison Fuller
68
0
0
03 Dec 2024
Bayesian Calibration of Win Rate Estimation with LLM Evaluators
Bayesian Calibration of Win Rate Estimation with LLM Evaluators
Yicheng Gao
G. Xu
Zhe Wang
Arman Cohan
31
6
0
07 Nov 2024
SkillAggregation: Reference-free LLM-Dependent Aggregation
SkillAggregation: Reference-free LLM-Dependent Aggregation
Guangzhi Sun
Anmol Kagrecha
Potsawee Manakul
Phil Woodland
Mark J. F. Gales
22
0
0
14 Oct 2024
Calibrate to Discriminate: Improve In-Context Learning with Label-Free
  Comparative Inference
Calibrate to Discriminate: Improve In-Context Learning with Label-Free Comparative Inference
Wei Cheng
Tianlu Wang
Yanmin Ji
Fan Yang
Keren Tan
Yiyu Zheng
13
0
0
03 Oct 2024
Aligning with Logic: Measuring, Evaluating and Improving Logical Preference Consistency in Large Language Models
Aligning with Logic: Measuring, Evaluating and Improving Logical Preference Consistency in Large Language Models
Yinhong Liu
Zhijiang Guo
Tianya Liang
Ehsan Shareghi
Ivan Vulić
Nigel Collier
70
0
0
03 Oct 2024
Training Language Models to Win Debates with Self-Play Improves Judge
  Accuracy
Training Language Models to Win Debates with Self-Play Improves Judge Accuracy
Samuel Arnesen
David Rein
Julian Michael
ELM
28
3
0
25 Sep 2024
From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks
From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks
Andreas Stephan
D. Zhu
Matthias Aßenmacher
Xiaoyu Shen
Benjamin Roth
ELM
45
4
0
06 Sep 2024
Grammatical Error Feedback: An Implicit Evaluation Approach
Grammatical Error Feedback: An Implicit Evaluation Approach
Stefano Bannò
Kate Knill
Mark J. F. Gales
18
0
0
18 Aug 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
ELM
ALM
45
55
0
18 Jun 2024
Fairer Preferences Elicit Improved Human-Aligned Large Language Model
  Judgments
Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments
Han Zhou
Xingchen Wan
Yinhong Liu
Nigel Collier
Ivan Vulić
Anna Korhonen
ALM
29
9
0
17 Jun 2024
Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge
  Devices
Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices
Ruiyang Qin
Dancheng Liu
Zheyu Yan
Zhaoxuan Tan
Zixuan Pan
Zhenge Jia
Meng-Long Jiang
Ahmed Abbasi
Jinjun Xiong
Yiyu Shi
51
10
0
06 Jun 2024
Grade Like a Human: Rethinking Automated Assessment with Large Language
  Models
Grade Like a Human: Rethinking Automated Assessment with Large Language Models
Wenjing Xie
Juxin Niu
Chun Jason Xue
Nan Guan
AI4Ed
34
3
0
30 May 2024
Efficient LLM Comparative Assessment: a Product of Experts Framework for
  Pairwise Comparisons
Efficient LLM Comparative Assessment: a Product of Experts Framework for Pairwise Comparisons
Adian Liusie
Vatsal Raina
Yassir Fathullah
Mark J. F. Gales
43
8
0
09 May 2024
LLMs for Generating and Evaluating Counterfactuals: A Comprehensive
  Study
LLMs for Generating and Evaluating Counterfactuals: A Comprehensive Study
Van Bach Nguyen
Paul Youssef
Jorg Schlotterer
Christin Seifert
37
14
0
26 Apr 2024
PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison
PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison
chaeHun Park
Minseok Choi
Dohyun Lee
Jaegul Choo
35
4
0
01 Apr 2024
Prediction-Powered Ranking of Large Language Models
Prediction-Powered Ranking of Large Language Models
Ivi Chatzi
Eleni Straitouri
Suhas Thejaswi
Manuel Gomez Rodriguez
ALM
24
5
0
27 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
53
28
0
02 Feb 2024
Typhoon: Thai Large Language Models
Typhoon: Thai Large Language Models
Kunat Pipatanakul
Phatrasek Jirabovonvisut
Potsawee Manakul
Sittipong Sripaisarnmongkol
Ruangsak Patomwong
Pathomporn Chokchainant
Kasima Tharnpipitchai
35
16
0
21 Dec 2023
Zero-shot Audio Topic Reranking using Large Language Models
Zero-shot Audio Topic Reranking using Large Language Models
Mengjie Qian
Rao Ma
Adian Liusie
Erfan Loweimi
Kate Knill
Mark J. F. Gales
24
1
0
14 Sep 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
206
559
0
03 May 2023
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for
  Generative Large Language Models
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Potsawee Manakul
Adian Liusie
Mark J. F. Gales
HILM
LRM
150
386
0
15 Mar 2023
1