ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.15621
  4. Cited By
ChatGPT as a Factual Inconsistency Evaluator for Text Summarization
v1v2 (latest)

ChatGPT as a Factual Inconsistency Evaluator for Text Summarization

27 March 2023
Zheheng Luo
Qianqian Xie
Sophia Ananiadou
    ELMHILMALM
ArXiv (abs)PDFHTML

Papers citing "ChatGPT as a Factual Inconsistency Evaluator for Text Summarization"

50 / 54 papers shown
OPOR-Bench: Evaluating Large Language Models on Online Public Opinion Report Generation
OPOR-Bench: Evaluating Large Language Models on Online Public Opinion Report Generation
Jinzheng Yu
Yang Xu
Haozhen Li
Junqi Li
Yifan Feng
Ligu Zhu
Hao Shen
Lei Shi
ELM
309
2
0
01 Dec 2025
Learning to Reason for Hallucination Span Detection
Learning to Reason for Hallucination Span Detection
Hsuan Su
Ting-Yao Hu
H. Koppula
Kundan Krishna
Hadi Pouransari
Cheng-Yu Hsieh
Cem Koc
Joseph Y Cheng
Oncel Tuzel
Raviteja Vemulapalli
ReLMOffRLHILMLRM
302
3
0
02 Oct 2025
SCI-Verifier: Scientific Verifier with Thinking
SCI-Verifier: Scientific Verifier with Thinking
Shenghe Zheng
Chenyu Huang
F. Yu
Junchi Yao
Jingqi Ye
...
Yun Luo
Ning Ding
Wenlong Zhang
Ganqu Cui
Peng Ye
LRM
175
3
0
29 Sep 2025
Fine-Grained Detection of Context-Grounded Hallucinations Using LLMs
Fine-Grained Detection of Context-Grounded Hallucinations Using LLMs
Yehonatan Peisakhovsky
Zorik Gekhman
Y. Mass
Liat Ein-Dor
Roi Reichart
HILM
187
1
0
26 Sep 2025
Principled Detection of Hallucinations in Large Language Models via Multiple Testing
Principled Detection of Hallucinations in Large Language Models via Multiple Testing
Jiawei Li
A. Magesh
Venugopal V. Veeravalli
HILM
266
2
0
25 Aug 2025
Your Agent Can Defend Itself against Backdoor Attacks
Your Agent Can Defend Itself against Backdoor Attacks
Li Changjiang
Liang Jiacheng
Cao Bochuan
Chen Jinghui
Wang Ting
AAMLLLMAG
417
5
0
10 Jun 2025
Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation
Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation
Noy Sternlicht
Ariel Gera
Roy Bar-Haim
Kyle Lo
Noam Slonim
ELM
353
0
0
05 Jun 2025
RvLLM: LLM Runtime Verification with Domain Knowledge
RvLLM: LLM Runtime Verification with Domain Knowledge
Yedi Zhang
Sun Yi Emma
Annabelle Lee Jia En
Jin Song Dong
460
7
0
24 May 2025
Long-Form Information Alignment Evaluation Beyond Atomic Facts
Long-Form Information Alignment Evaluation Beyond Atomic Facts
Danna Zheng
Mirella Lapata
Jeff Z. Pan
HILM
287
1
0
21 May 2025
Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards
Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards
Manveer Singh Tamber
F. S. Bao
Chenyu Xu
Ge Luo
Suleman Kazi
Minseok Bae
Miaoran Li
Ofer Mendelevitch
Renyi Qu
Jimmy J. Lin
VLM
471
8
0
07 May 2025
A Survey on Transformer Context Extension: Approaches and Evaluation
A Survey on Transformer Context Extension: Approaches and Evaluation
Yijun Liu
Jinzheng Yu
Yang Xu
Zhongyang Li
Qingfu Zhu
LLMAG
575
15
0
17 Mar 2025
GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation
GraphEval: A Lightweight Graph-Based LLM Framework for Idea EvaluationInternational Conference on Learning Representations (ICLR), 2025
Tao Feng
Yihang Sun
Jiaxuan You
534
19
0
16 Mar 2025
Hallucination Detection in Large Language Models with Metamorphic Relations
Hallucination Detection in Large Language Models with Metamorphic Relations
Borui Yang
Md Afif Al Mamun
Jie M. Zhang
Gias Uddin
HILM
551
26
0
20 Feb 2025
SummExecEdit: A Factual Consistency Benchmark in Summarization with Executable Edits
SummExecEdit: A Factual Consistency Benchmark in Summarization with Executable Edits
Onkar Thorat
Philippe Laban
Chien-Sheng Wu
HILM
426
1
0
17 Dec 2024
Do Automatic Factuality Metrics Measure Factuality? A Critical Evaluation
Do Automatic Factuality Metrics Measure Factuality? A Critical Evaluation
S. Ramprasad
Byron C. Wallace
LLMAGHILM
716
8
0
25 Nov 2024
Multi-hop Evidence Pursuit Meets the Web: Team Papelo at FEVER 2024
Multi-hop Evidence Pursuit Meets the Web: Team Papelo at FEVER 2024
Christopher Malon
LRM
193
5
0
08 Nov 2024
FaithBench: A Diverse Hallucination Benchmark for Summarization by
  Modern LLMs
FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
F. S. Bao
Miaoran Li
Renyi Qu
Ge Luo
Erana Wan
...
Ruixuan Tu
Chenyu Xu
Matthew Gonzales
Ofer Mendelevitch
Amin Ahmad
VLMHILM
284
17
0
17 Oct 2024
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win RatesInternational Conference on Learning Representations (ICLR), 2024
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min Lin
317
27
0
09 Oct 2024
From Facts to Insights: A Study on the Generation and Evaluation of
  Analytical Reports for Deciphering Earnings Calls
From Facts to Insights: A Study on the Generation and Evaluation of Analytical Reports for Deciphering Earnings CallsInternational Conference on Computational Linguistics (COLING), 2024
Tomas Goldsack
Yang Wang
Chenghua Lin
Chung-Chi Chen
159
11
0
01 Oct 2024
T3: A Novel Zero-shot Transfer Learning Framework Iteratively Training on an Assistant Task for a Target Task
T3: A Novel Zero-shot Transfer Learning Framework Iteratively Training on an Assistant Task for a Target TaskInternational Conference on Intelligent Computing (ICIC), 2024
Xindi Tong
Yujin Zhu
Shijian Fan
Liang Xu
535
1
0
26 Sep 2024
Speech vs. Transcript: Does It Matter for Human Annotators in Speech
  Summarization?
Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization?Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Roshan S. Sharma
Suwon Shon
Mark Lindsey
Hira Dhamyal
Rita Singh
Bhiksha Raj
268
4
0
12 Aug 2024
Crafting the Path: Robust Query Rewriting for Information Retrieval
Crafting the Path: Robust Query Rewriting for Information Retrieval
Ingeol Baek
Jimin Lee
Joonho Yang
Hwanhee Lee
247
11
0
17 Jul 2024
EVA-Score: Evaluating Abstractive Long-form Summarization on Informativeness through Extraction and Validation
EVA-Score: Evaluating Abstractive Long-form Summarization on Informativeness through Extraction and Validation
Wendi Li
Xin Zhong
Chengsi Wang
Gaoche Wu
Bowen Zhou
Bowen Zhou
234
2
0
06 Jul 2024
A Systematic Survey and Critical Review on Evaluating Large Language
  Models: Challenges, Limitations, and Recommendations
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Md Tahmid Rahman Laskar
Sawsan Alqahtani
M Saiful Bari
Mizanur Rahman
Mohammad Abdullah Matin Khan
...
Chee Wei Tan
Md. Rizwan Parvez
Enamul Hoque
Shafiq Joty
Jimmy Huang
ELMALM
303
104
0
04 Jul 2024
Detecting Errors through Ensembling Prompts (DEEP): An End-to-End LLM
  Framework for Detecting Factual Errors
Detecting Errors through Ensembling Prompts (DEEP): An End-to-End LLM Framework for Detecting Factual ErrorsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Alex Chandler
Devesh Surve
Hui Su
HILMUQCV
189
4
0
18 Jun 2024
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer
  Selection in Large Language Models
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models
Zhangyue Yin
Qiushi Sun
Qipeng Guo
Zhiyuan Zeng
Xiaonan Li
...
Qinyuan Cheng
Ding Wang
Xiaofeng Mou
Xipeng Qiu
XuanJing Huang
LRM
278
9
0
21 May 2024
Large Language Models are Inconsistent and Biased Evaluators
Large Language Models are Inconsistent and Biased Evaluators
Rickard Stureborg
Dimitris Alikaniotis
Yoshi Suhara
ALM
451
113
0
02 May 2024
FIZZ: Factual Inconsistency Detection by Zoom-in Summary and Zoom-out
  Document
FIZZ: Factual Inconsistency Detection by Zoom-in Summary and Zoom-out Document
Joonho Yang
Seunghyun Yoon
Byeongjeong Kim
Hwanhee Lee
HILM
360
13
0
17 Apr 2024
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents
Liyan Tang
Philippe Laban
Greg Durrett
HILMSyDa
436
192
0
16 Apr 2024
Less is More for Improving Automatic Evaluation of Factual Consistency
Less is More for Improving Automatic Evaluation of Factual ConsistencyNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Tong Wang
Ninad Kulkarni
Yanjun Qi
ALM
195
2
0
09 Apr 2024
SIFiD: Reassess Summary Factual Inconsistency Detection with LLM
SIFiD: Reassess Summary Factual Inconsistency Detection with LLM
Jiuding Yang
Hui Liu
Weidong Guo
Zhuwei Rao
Yu-Syuan Xu
Di Niu
HILM
305
1
0
12 Mar 2024
German also Hallucinates! Inconsistency Detection in News Summaries with
  the Absinth Dataset
German also Hallucinates! Inconsistency Detection in News Summaries with the Absinth Dataset
Laura Mascarell
Ribin Chalumattu
Annette Rios
HILM
405
1
0
06 Mar 2024
A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods
A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods
Hanlei Jin
Yang Zhang
Dan Meng
Jun Wang
Jinghua Tan
820
200
0
05 Mar 2024
FENICE: Factuality Evaluation of summarization based on Natural language
  Inference and Claim Extraction
FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction
Alessandro Sciré
Karim Ghonim
Roberto Navigli
HILM
333
24
0
04 Mar 2024
Identifying Factual Inconsistencies in Summaries: Grounding Model
  Inference via Task Taxonomy
Identifying Factual Inconsistencies in Summaries: Grounding Model Inference via Task Taxonomy
Liyan Xu
Zhenlin Su
Mo Yu
Jin Xu
Jinho D. Choi
Jie Zhou
Fei Liu
HILM
343
6
0
20 Feb 2024
FactPICO: Factuality Evaluation for Plain Language Summarization of
  Medical Evidence
FactPICO: Factuality Evaluation for Plain Language Summarization of Medical Evidence
Sebastian Antony Joseph
Lily Chen
Jan Trienes
Hannah Louisa Göke
Monika Coers
Wei Xu
Byron C. Wallace
Junyi Jessy Li
LM&MAHILM
221
25
0
18 Feb 2024
Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in
  Closed-Source LLMs
Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2024
Simone Balloccu
Patrícia Schmidtová
Mateusz Lango
Ondrej Dusek
SILMELMPILM
512
294
0
06 Feb 2024
Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains
Evaluating the Factuality of Zero-shot Summarizers Across Varied DomainsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2024
S. Ramprasad
Kundan Krishna
Zachary Chase Lipton
Byron C. Wallace
HILM
245
10
0
05 Feb 2024
Evaluating Large Language Models for Health-related Queries with
  Presuppositions
Evaluating Large Language Models for Health-related Queries with PresuppositionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Navreet Kaur
Monojit Choudhury
Danish Pruthi
HILMELM
279
12
0
14 Dec 2023
AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven
  Negative Samples Generation
AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation
Haoyi Qiu
Kung-Hsiang Huang
Jingnong Qu
Nanyun Peng
HILM
337
12
0
16 Nov 2023
A Survey on Hallucination in Large Language Models: Principles,
  Taxonomy, Challenges, and Open Questions
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
Lei Huang
Weijiang Yu
Weitao Ma
Weihong Zhong
Zhangyin Feng
...
Qianglong Chen
Weihua Peng
Xiaocheng Feng
Bing Qin
Ting Liu
LRMHILM
555
2,311
0
09 Nov 2023
OpinSummEval: Revisiting Automated Evaluation for Opinion Summarization
OpinSummEval: Revisiting Automated Evaluation for Opinion Summarization
Yuchen Shen
Xiaojun Wan
430
12
0
27 Oct 2023
On Context Utilization in Summarization with Large Language Models
On Context Utilization in Summarization with Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Mathieu Ravaut
Aixin Sun
Nancy F. Chen
Shafiq Joty
653
37
0
16 Oct 2023
Towards Better Evaluation of Instruction-Following: A Case-Study in
  Summarization
Towards Better Evaluation of Instruction-Following: A Case-Study in SummarizationConference on Computational Natural Language Learning (CoNLL), 2023
Ondrej Skopek
Rahul Aralikatte
Sian Gooding
Victor Carbune
ELM
340
23
0
12 Oct 2023
Well Begun is Half Done: Generator-agnostic Knowledge Pre-Selection for
  Knowledge-Grounded Dialogue
Well Begun is Half Done: Generator-agnostic Knowledge Pre-Selection for Knowledge-Grounded DialogueConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Lang Qin
Yao Zhang
Hongru Liang
Jun Wang
Zhenglu Yang
281
3
0
11 Oct 2023
Benchmarking Cognitive Biases in Large Language Models as Evaluators
Benchmarking Cognitive Biases in Large Language Models as EvaluatorsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Ryan Koo
Linghe Wang
Vipul Raheja
Jong Inn Park
Min Namgung
Luan Tuyen Chau
ALM
396
145
0
29 Sep 2023
LongDocFACTScore: Evaluating the Factuality of Long Document Abstractive
  Summarisation
LongDocFACTScore: Evaluating the Factuality of Long Document Abstractive SummarisationInternational Conference on Language Resources and Evaluation (LREC), 2023
Jennifer A Bishop
Qianqian Xie
Sophia Ananiadou
HILM
331
20
0
21 Sep 2023
Can Large Language Models Discern Evidence for Scientific Hypotheses?
  Case Studies in the Social Sciences
Can Large Language Models Discern Evidence for Scientific Hypotheses? Case Studies in the Social SciencesInternational Conference on Language Resources and Evaluation (LREC), 2023
S. Koneru
Jian Wu
Sarah Rajtmajer
340
13
0
07 Sep 2023
Translate Meanings, Not Just Words: IdiomKB's Role in Optimizing
  Idiomatic Translation with Language Models
Translate Meanings, Not Just Words: IdiomKB's Role in Optimizing Idiomatic Translation with Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2023
Shuang Li
Jiangjie Chen
Siyu Yuan
Xinyi Wu
Hao Yang
Shimin Tao
Yanghua Xiao
311
43
0
26 Aug 2023
System-Level Natural Language Feedback
System-Level Natural Language FeedbackConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Weizhe Yuan
Dong Wang
Jason Weston
436
5
0
23 Jun 2023
12
Next
Page 1 of 2