ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.04048
  4. Cited By
Is ChatGPT a Good NLG Evaluator? A Preliminary Study

Is ChatGPT a Good NLG Evaluator? A Preliminary Study

7 March 2023
Jiaan Wang
Yunlong Liang
Fandong Meng
Zengkui Sun
Haoxiang Shi
Zhixu Li
Jinan Xu
Jianfeng Qu
Jie Zhou
    LM&MA
    ELM
    ALM
    AI4MH
ArXivPDFHTML

Papers citing "Is ChatGPT a Good NLG Evaluator? A Preliminary Study"

44 / 44 papers shown
Title
Towards Efficient Educational Chatbots: Benchmarking RAG Frameworks
Umar Ali Khan
Ekram Khan
Fiza Khan
A. A. Moinuddin
48
0
0
02 Mar 2025
Pastiche Novel Generation Creating: Fan Fiction You Love in Your Favorite Author's Style
Pastiche Novel Generation Creating: Fan Fiction You Love in Your Favorite Author's Style
Xueran Han
Yuhan Liu
Mingzhe Li
W. Liu
Sen Hu
Rui Yan
Zhiqiang Xu
Xiuying Chen
59
0
0
24 Feb 2025
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
SeongYeub Chu
JongWoo Kim
MunYong Yi
53
1
0
21 Feb 2025
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models
Zihao Wei
Jingcheng Deng
Liang Pang
Hanxing Ding
Huawei Shen
Xueqi Cheng
KELM
78
4
0
20 Feb 2025
From Local to Global: A Graph RAG Approach to Query-Focused Summarization
From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Darren Edge
Ha Trinh
Newman Cheng
Joshua Bradley
Alex Chao
Apurva Mody
Steven Truitt
Dasha Metropolitansky
Robert Osazuwa Ness
Jonathan Larson
RALM
132
314
0
20 Feb 2025
Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models
Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models
Aliyah R. Hsu
James Zhu
Zhichao Wang
Bin Bi
Shubham Mehrotra
...
Sougata Chaudhuri
Regunathan Radhakrishnan
S. Asur
Claire Na Cheng
Bin Yu
ALM
LRM
67
0
0
20 Feb 2025
Can Knowledge Graphs Make Large Language Models More Trustworthy? An Empirical Study Over Open-ended Question Answering
Can Knowledge Graphs Make Large Language Models More Trustworthy? An Empirical Study Over Open-ended Question Answering
Yuan Sui
Yufei He
Zifeng Ding
Bryan Hooi
HILM
ELM
RALM
64
7
0
20 Feb 2025
Towards Reasoning Ability of Small Language Models
Towards Reasoning Ability of Small Language Models
Gaurav Srivastava
Shuxiang Cao
Xuan Wang
ReLM
LRM
49
4
0
17 Feb 2025
Evaluating Small Language Models for News Summarization: Implications and Factors Influencing Performance
Evaluating Small Language Models for News Summarization: Implications and Factors Influencing Performance
Borui Xu
Yao Chen
Zeyi Wen
Weiguo Liu
Bingsheng He
62
1
0
02 Feb 2025
Learning to Summarize from LLM-generated Feedback
Learning to Summarize from LLM-generated Feedback
Hwanjun Song
Taewon Yun
Yuho Lee
Jihwan Oh
Gihun Lee
Jason (Jinglun) Cai
Hang Su
73
1
0
28 Jan 2025
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Yinhong Liu
Han Zhou
Zhijiang Guo
Ehsan Shareghi
Ivan Vulić
Anna Korhonen
Nigel Collier
ALM
128
64
0
20 Jan 2025
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Yingzi Ma
Jiongxiao Wang
Fei-Yue Wang
Siyuan Ma
Jiazhao Li
...
B. Li
Yejin Choi
M. Chen
Chaowei Xiao
Chaowei Xiao
MU
49
6
0
05 Nov 2024
On A Scale From 1 to 5: Quantifying Hallucination in Faithfulness Evaluation
On A Scale From 1 to 5: Quantifying Hallucination in Faithfulness Evaluation
Xiaonan Jing
Srinivas Billa
Danny Godbout
HILM
33
0
0
16 Oct 2024
Beyond Exact Match: Semantically Reassessing Event Extraction by Large Language Models
Beyond Exact Match: Semantically Reassessing Event Extraction by Large Language Models
Yi-Fan Lu
Xian-Ling Mao
Tian Lan
Heyan Huang
Heyan Huang
Xiaoyan Gao
45
0
0
12 Oct 2024
Data Processing for the OpenGPT-X Model Family
Data Processing for the OpenGPT-X Model Family
Nicolo' Brandizzi
Hammam Abdelwahab
Anirban Bhowmick
Lennard Helmer
Benny Jörg Stein
...
Georg Rehm
Dennis Wegener
Nicolas Flores-Herr
Joachim Kohler
Johannes Leveling
VLM
79
2
0
11 Oct 2024
CREAM: Comparison-Based Reference-Free ELO-Ranked Automatic Evaluation
  for Meeting Summarization
CREAM: Comparison-Based Reference-Free ELO-Ranked Automatic Evaluation for Meeting Summarization
Ziwei Gong
Lin Ai
Harshsaiprasad Deshpande
Alexander Johnson
Emmy Phung
Zehui Wu
Ahmad Emami
Julia Hirschberg
23
2
0
17 Sep 2024
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering
Sacha Muller
António Loison
Bilel Omrani
Gautier Viaud
RALM
ELM
26
1
0
10 Sep 2024
XG-NID: Dual-Modality Network Intrusion Detection using a Heterogeneous Graph Neural Network and Large Language Model
XG-NID: Dual-Modality Network Intrusion Detection using a Heterogeneous Graph Neural Network and Large Language Model
Yasir Ali Farrukh
S. Wali
I. Khan
Nathaniel D. Bastian
47
2
0
27 Aug 2024
DHP Benchmark: Are LLMs Good NLG Evaluators?
DHP Benchmark: Are LLMs Good NLG Evaluators?
Yicheng Wang
Jiayi Yuan
Yu-Neng Chuang
Zhuoer Wang
Yingchi Liu
Mark Cusick
Param Kulkarni
Zhengping Ji
Yasser Ibrahim
Xia Hu
LM&MA
ELM
41
3
0
25 Aug 2024
Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates
Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates
Hui Wei
Shenghua He
Tian Xia
Andy H. Wong
Jingyang Lin
Mei Han
Mei Han
ALM
ELM
59
22
0
23 Aug 2024
Enhancing Emotion Prediction in News Headlines: Insights from ChatGPT
  and Seq2Seq Models for Free-Text Generation
Enhancing Emotion Prediction in News Headlines: Insights from ChatGPT and Seq2Seq Models for Free-Text Generation
Ge Gao
Jongin Kim
Sejin Paik
Ekaterina Novozhilova
Yi Liu
Sarah Bonna
Margrit Betke
Derry Wijaya
26
0
0
14 Jul 2024
EventChat: Implementation and user-centric evaluation of a large
  language model-driven conversational recommender system for exploring leisure
  events in an SME context
EventChat: Implementation and user-centric evaluation of a large language model-driven conversational recommender system for exploring leisure events in an SME context
Hannes Kunstmann
J. Ollier
Joel Persson
F. Wangenheim
33
0
0
05 Jul 2024
FineSurE: Fine-grained Summarization Evaluation using LLMs
FineSurE: Fine-grained Summarization Evaluation using LLMs
Hwanjun Song
Hang Su
Igor Shalyminov
Jason (Jinglun) Cai
Saab Mansour
HILM
23
29
0
01 Jul 2024
Large Language Models as Evaluators for Recommendation Explanations
Large Language Models as Evaluators for Recommendation Explanations
Xiaoyu Zhang
Yishan Li
Jiayin Wang
Bowen Sun
Weizhi Ma
Peijie Sun
Min Zhang
LRM
ELM
32
12
0
05 Jun 2024
XRec: Large Language Models for Explainable Recommendation
XRec: Large Language Models for Explainable Recommendation
Qiyao Ma
Xubin Ren
Chao Huang
LRM
23
17
0
04 Jun 2024
Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals
Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals
Phillip Howard
Kathleen C. Fraser
Anahita Bhiwandiwalla
S. Kiritchenko
48
9
0
30 May 2024
CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and
  Evaluation Framework for Chinese Psychological Counseling
CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling
Chenhao Zhang
Renhao Li
Minghuan Tan
Min Yang
Jingwei Zhu
Di Yang
Jiahao Zhao
Guancheng Ye
Chengming Li
Xiping Hu
31
18
0
26 May 2024
SLIDE: A Framework Integrating Small and Large Language Models for
  Open-Domain Dialogues Evaluation
SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues Evaluation
Kun Zhao
Bohao Yang
Chen Tang
Chenghua Lin
Liang Zhan
28
5
0
24 May 2024
CHARP: Conversation History AwaReness Probing for Knowledge-grounded
  Dialogue Systems
CHARP: Conversation History AwaReness Probing for Knowledge-grounded Dialogue Systems
Abbas Ghaddar
David Alfonso-Hermelo
Philippe Langlais
Mehdi Rezagholizadeh
Boxing Chen
Prasanna Parthasarathi
29
0
0
24 May 2024
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer
  Selection in Large Language Models
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models
Zhangyue Yin
Qiushi Sun
Qipeng Guo
Zhiyuan Zeng
Xiaonan Li
...
Qinyuan Cheng
Ding Wang
Xiaofeng Mou
Xipeng Qiu
XuanJing Huang
LRM
41
3
0
21 May 2024
CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large
  Language Models
CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models
Haoxiang Shi
Jiaan Wang
Jiarong Xu
Cen Wang
Tetsuya Sakai
LMTD
21
0
0
20 May 2024
LLMEval: A Preliminary Study on How to Evaluate Large Language Models
LLMEval: A Preliminary Study on How to Evaluate Large Language Models
Yue Zhang
Ming Zhang
Haipeng Yuan
Shichun Liu
Yongyao Shi
Tao Gui
Qi Zhang
Xuanjing Huang
ALM
ELM
11
10
0
12 Dec 2023
Universal Self-Consistency for Large Language Model Generation
Universal Self-Consistency for Large Language Model Generation
Xinyun Chen
Renat Aksitov
Uri Alon
Jie Jessie Ren
Kefan Xiao
Pengcheng Yin
Sushant Prakash
Charles Sutton
Xuezhi Wang
Denny Zhou
LRM
24
65
0
29 Nov 2023
Language Models Hallucinate, but May Excel at Fact Verification
Language Models Hallucinate, but May Excel at Fact Verification
Jian-Yu Guan
Jesse Dodge
David Wadden
Minlie Huang
Hao Peng
LRM
HILM
15
28
0
23 Oct 2023
Ragas: Automated Evaluation of Retrieval Augmented Generation
Ragas: Automated Evaluation of Retrieval Augmented Generation
ES Shahul
Jithin James
Luis Espinosa-Anke
Steven Schockaert
80
170
0
26 Sep 2023
Automatic Answerability Evaluation for Question Generation
Automatic Answerability Evaluation for Question Generation
Zifan Wang
Kotaro Funakoshi
Manabu Okumura
11
2
0
22 Sep 2023
Instruction Position Matters in Sequence Generation with Large Language
  Models
Instruction Position Matters in Sequence Generation with Large Language Models
Yanjun Liu
Xianfeng Zeng
Fandong Meng
Jie Zhou
LRM
35
8
0
23 Aug 2023
Learning Evaluation Models from Large Language Models for Sequence Generation
Learning Evaluation Models from Large Language Models for Sequence Generation
Chenglong Wang
Hang Zhou
Kai-Chun Chang
Tongran Liu
Chunliang Zhang
Quan Du
Tong Xiao
Yue Zhang
Jingbo Zhu
ELM
34
3
0
08 Aug 2023
ChatAgri: Exploring Potentials of ChatGPT on Cross-linguistic
  Agricultural Text Classification
ChatAgri: Exploring Potentials of ChatGPT on Cross-linguistic Agricultural Text Classification
Biao Zhao
Weiqiang Jin
Javier Del Ser
Guangyao Yang
19
55
0
24 May 2023
Chain-of-Dictionary Prompting Elicits Translation in Large Language
  Models
Chain-of-Dictionary Prompting Elicits Translation in Large Language Models
Hongyuan Lu
Haoran Yang
Haoyang Huang
Dongdong Zhang
Wai Lam
Furu Wei
LRM
AI4CE
25
15
0
11 May 2023
Are Large Language Models Ready for Healthcare? A Comparative Study on
  Clinical Language Understanding
Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding
Yuqing Wang
Yun Zhao
Linda R. Petzold
AI4MH
LM&MA
ELM
11
50
0
09 Apr 2023
ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on
  Simplified Radiology Reports
ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports
Katharina Jeblick
B. Schachtner
Jakob Dexl
Andreas Mittermeier
Anna Theresa Stüber
...
Tobias Weber
Philipp Wesp
B. Sabel
J. Ricke
Michael Ingrisch
LM&MA
MedIm
103
368
0
30 Dec 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Teaching Machines to Read and Comprehend
Teaching Machines to Read and Comprehend
Karl Moritz Hermann
Tomás Kociský
Edward Grefenstette
L. Espeholt
W. Kay
Mustafa Suleyman
Phil Blunsom
170
3,504
0
10 Jun 2015
1