ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.16634
  4. Cited By
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

29 March 2023
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
    ELM
    ALM
    LM&MA
ArXivPDFHTML

Papers citing "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"

50 / 145 papers shown
Title
Why Are You Wrong? Counterfactual Explanations for Language Grounding with 3D Objects
Why Are You Wrong? Counterfactual Explanations for Language Grounding with 3D Objects
Tobias Preintner
Weixuan Yuan
Qi Huang
Adrian König
Thomas Bäck
E. Raponi
N. V. Stein
19
0
0
09 May 2025
clem:todd: A Framework for the Systematic Benchmarking of LLM-Based Task-Oriented Dialogue System Realisations
clem:todd: A Framework for the Systematic Benchmarking of LLM-Based Task-Oriented Dialogue System Realisations
Chalamalasetti Kranti
Sherzod Hakimov
David Schlangen
LLMAG
38
0
0
08 May 2025
SEval-Ex: A Statement-Level Framework for Explainable Summarization Evaluation
SEval-Ex: A Statement-Level Framework for Explainable Summarization Evaluation
Tanguy Herserant
Vincent Guigue
ELM
33
0
0
04 May 2025
LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning
LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning
Joy Lim Jia Yin
Daniel Zhang-Li
Jifan Yu
H. Li
Shangqing Tu
...
Zhiyuan Liu
Huiqin Liu
Lei Hou
Juanzi Li
Bin Xu
22
0
0
04 May 2025
LookAlike: Consistent Distractor Generation in Math MCQs
LookAlike: Consistent Distractor Generation in Math MCQs
Nisarg Parikh
Nigel Fernandez
Alexander Scarlatos
Simon Woodhead
Andrew S. Lan
41
0
0
03 May 2025
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
Shaokun Zhang
Ming Yin
Jieyu Zhang
J. H. Liu
Zhiguang Han
...
Beibin Li
Chi Wang
H. Wang
Y. Chen
Qingyun Wu
47
0
0
30 Apr 2025
Automatic Legal Writing Evaluation of LLMs
Automatic Legal Writing Evaluation of LLMs
Ramon Pires
Roseval Malaquias Junior
Rodrigo Nogueira
AILaw
ELM
81
0
0
29 Apr 2025
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models
Mihai Nadas
Laura Diosan
Andrei Piscoran
Andreea Tomescu
VGen
50
0
0
29 Apr 2025
JaccDiv: A Metric and Benchmark for Quantifying Diversity of Generated Marketing Text in the Music Industry
JaccDiv: A Metric and Benchmark for Quantifying Diversity of Generated Marketing Text in the Music Industry
Anum Afzal
Alexandre Mercier
Florian Matthes
52
0
0
29 Apr 2025
Evaluate-and-Purify: Fortifying Code Language Models Against Adversarial Attacks Using LLM-as-a-Judge
Evaluate-and-Purify: Fortifying Code Language Models Against Adversarial Attacks Using LLM-as-a-Judge
Wenhan Mu
Ling Xu
Shuren Pei
Le Mi
Huichi Zhou
AAML
ELM
48
0
0
28 Apr 2025
LLM-Evaluation Tropes: Perspectives on the Validity of LLM-Evaluations
LLM-Evaluation Tropes: Perspectives on the Validity of LLM-Evaluations
Laura Dietz
Oleg Zendel
P. Bailey
Charles L. A. Clarke
Ellese Cotterill
Jeff Dalton
Faegheh Hasibi
Mark Sanderson
Nick Craswell
ELM
43
0
0
27 Apr 2025
KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation
KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation
Jiabin Fan
Guoqing Luo
Michael Bowling
Lili Mou
OffRL
61
0
0
26 Apr 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
X. Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Yu Jiang
ALM
ELM
84
0
0
26 Apr 2025
Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections
Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections
Narek Maloyan
Dmitry Namiot
SILM
AAML
ELM
75
0
0
25 Apr 2025
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Minju Seo
Jinheon Baek
Seongyun Lee
S. Hwang
AI4CE
35
0
0
24 Apr 2025
DocAgent: A Multi-Agent System for Automated Code Documentation Generation
DocAgent: A Multi-Agent System for Automated Code Documentation Generation
Dayu Yang
Antoine Simoulin
Xin Qian
Xiaoyi Liu
Yuwei Cao
Zhaopu Teng
Grey Yang
LLMAG
54
0
0
11 Apr 2025
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Hamed Mahdavi
Alireza Hashemi
Majid Daliri
Pegah Mohammadipour
Alireza Farhadi
Samira Malek
Yekta Yazdanifard
Amir Khasahmadi
V. Honavar
ELM
LRM
49
1
0
01 Apr 2025
DomainCQA: Crafting Expert-Level QA from Domain-Specific Charts
DomainCQA: Crafting Expert-Level QA from Domain-Specific Charts
Ling Zhong
Yujing Lu
Jing Yang
Weiming Li
Peng Wei
Yongheng Wang
Manni Duan
Qing Zhang
45
0
0
25 Mar 2025
ConvoGen: Enhancing Conversational AI with Synthetic Data: A Multi-Agent Approach
ConvoGen: Enhancing Conversational AI with Synthetic Data: A Multi-Agent Approach
Reem Gody
Mahmoud Goudy
Ahmed Tawfik
SyDa
74
0
0
21 Mar 2025
Unequal Opportunities: Examining the Bias in Geographical Recommendations by Large Language Models
Unequal Opportunities: Examining the Bias in Geographical Recommendations by Large Language Models
Shiran Dudy
Thulasi Tholeti
R. Ramachandranpillai
Muhammad Ali
Toby Jia-Jun Li
Ricardo Baeza-Yates
21
0
0
16 Mar 2025
Take Off the Training Wheels Progressive In-Context Learning for Effective Alignment
Zhenyu Liu
Dongfang Li
Xinshuo Hu
X. Zhao
Yibin Chen
Baotian Hu
Min-Ling Zhang
46
1
0
13 Mar 2025
WildIFEval: Instruction Following in the Wild
Gili Lior
Asaf Yehudai
Ariel Gera
L. Ein-Dor
64
0
0
09 Mar 2025
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
Xin Ding
Hao Wu
Y. Yang
Shiqi Jiang
Donglin Bai
Zhibo Chen
Ting Cao
50
0
0
08 Mar 2025
Topology-Aware Conformal Prediction for Stream Networks
Jifan Zhang
Fangxin Wang
Philip S. Yu
Kaize Ding
Shixiang Zhu
AI4TS
37
0
0
06 Mar 2025
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
Lu Dai
Yijie Xu
Jinhui Ye
Hao Liu
Hui Xiong
3DV
RALM
71
2
0
03 Mar 2025
How Diversely Can Language Models Solve Problems? Exploring the Algorithmic Diversity of Model-Generated Code
Seonghyeon Lee
Heejae Chon
Joonwon Jang
Dongha Lee
Hwanjo Yu
ALM
39
0
0
02 Mar 2025
A Pilot Empirical Study on When and How to Use Knowledge Graphs as Retrieval Augmented Generation
A Pilot Empirical Study on When and How to Use Knowledge Graphs as Retrieval Augmented Generation
Xujie Yuan
Y. Liu
Shimin Di
Shiwen Wu
Libin Zheng
Rui Meng
Lei Chen
Xiaofang Zhou
Jian Yin
36
0
0
28 Feb 2025
Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework
Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework
Kaishuai Xu
Tiezheng YU
Wenjun Hou
Yi Cheng
Liangyou Li
Xin Jiang
Lifeng Shang
Q. Liu
Wenjie Li
ELM
66
0
0
26 Feb 2025
PiCO: Peer Review in LLMs based on the Consistency Optimization
PiCO: Peer Review in LLMs based on the Consistency Optimization
Kun-Peng Ning
Shuo Yang
Yu-Yang Liu
Jia-Yu Yao
Zhen-Hui Liu
Yu Wang
Ming Pang
Li Yuan
ALM
69
8
0
24 Feb 2025
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
SeongYeub Chu
JongWoo Kim
MunYong Yi
53
1
0
21 Feb 2025
Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models
Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models
Aliyah R. Hsu
James Zhu
Zhichao Wang
Bin Bi
Shubham Mehrotra
...
Sougata Chaudhuri
Regunathan Radhakrishnan
S. Asur
Claire Na Cheng
Bin Yu
ALM
LRM
67
0
0
20 Feb 2025
Cost-Saving LLM Cascades with Early Abstention
Cost-Saving LLM Cascades with Early Abstention
Michael J. Zellinger
Rex Liu
Matt Thomson
98
0
0
13 Feb 2025
Bridging LLM-Generated Code and Requirements: Reverse Generation technique and SBC Metric for Developer Insights
Bridging LLM-Generated Code and Requirements: Reverse Generation technique and SBC Metric for Developer Insights
Ahilan Ayyachamy Nadar Ponnusamy
58
0
0
11 Feb 2025
SynDL: A Large-Scale Synthetic Test Collection for Passage Retrieval
Hossein A. Rahmani
Xi Wang
Emine Yilmaz
Nick Craswell
Bhaskar Mitra
Paul Thomas
79
4
0
28 Jan 2025
Learning to Summarize from LLM-generated Feedback
Learning to Summarize from LLM-generated Feedback
Hwanjun Song
Taewon Yun
Yuho Lee
Jihwan Oh
Gihun Lee
Jason (Jinglun) Cai
Hang Su
73
1
0
28 Jan 2025
Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-Judge
Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-Judge
Aparna Elangovan
Jongwoo Ko
Lei Xu
Mahsa Elyasi
Ling Liu
S. Bodapati
Dan Roth
35
5
0
28 Jan 2025
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Mingqi Gao
Xinyu Hu
Li Lin
Xiaojun Wan
28
1
0
28 Jan 2025
MDEval: Evaluating and Enhancing Markdown Awareness in Large Language Models
Zhongpu Chen
Y. Liu
Long Shi
Zhi-Jie Wang
Xingyan Chen
Yu Zhao
Fuji Ren
41
0
0
28 Jan 2025
Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation
Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation
Takyoung Kim
Kyungjae Lee
Y. Jang
Ji Yong Cho
Gangwoo Kim
Minseok Cho
Moontae Lee
83
0
0
28 Jan 2025
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Yinhong Liu
Han Zhou
Zhijiang Guo
Ehsan Shareghi
Ivan Vulić
Anna Korhonen
Nigel Collier
ALM
128
64
0
20 Jan 2025
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Junyi Ao
Yuancheng Wang
Xiaohai Tian
Dekun Chen
J. Zhang
Lu Lu
Y. Wang
Haizhou Li
Z. Wu
AuLLM
77
16
0
17 Jan 2025
PASS: Presentation Automation for Slide Generation and Speech
PASS: Presentation Automation for Slide Generation and Speech
Tushar Aggarwal
Aarohi Bhand
54
1
0
17 Jan 2025
CodEv: An Automated Grading Framework Leveraging Large Language Models for Consistent and Constructive Feedback
CodEv: An Automated Grading Framework Leveraging Large Language Models for Consistent and Constructive Feedback
En-Qi Tseng
Pei-Cing Huang
Chan Hsu
Peng-Yi Wu
Chan-Tung Ku
Yihuang Kang
41
1
0
10 Jan 2025
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
Ruosen Li
Teerth Patel
Xinya Du
LLMAG
ALM
52
94
0
03 Jan 2025
LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts
Helia Hashemi
J. Eisner
Corby Rosset
Benjamin Van Durme
Chris Kedzie
65
1
0
03 Jan 2025
A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls
A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls
Sheikh Shafayat
Dongkeun Yoon
Woori Jang
Jiwoo Choi
Alice H. Oh
Seohyon Jung
91
1
0
03 Jan 2025
Revisiting In-Context Learning with Long Context Language Models
Revisiting In-Context Learning with Long Context Language Models
Jinheon Baek
Sun Jae Lee
Prakhar Gupta
Geunseob
Oh
Siddharth Dalmia
98
0
0
22 Dec 2024
Towards Automatic Evaluation for Image Transcreation
Towards Automatic Evaluation for Image Transcreation
Simran Khanuja
Vivek Iyer
Claire He
Graham Neubig
ViT
77
1
0
18 Dec 2024
EventSum: A Large-Scale Event-Centric Summarization Dataset for Chinese Multi-News Documents
EventSum: A Large-Scale Event-Centric Summarization Dataset for Chinese Multi-News Documents
Mengna Zhu
Kaisheng Zeng
Mao Wang
Kaiming Xiao
Lei Hou
Hongbin Huang
Juanzi Li
102
1
0
16 Dec 2024
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
Xueqing Wu
Yuheng Ding
Bingxuan Li
Pan Lu
Da Yin
Kai-Wei Chang
Nanyun Peng
LRM
100
3
0
03 Dec 2024
123
Next