v1v2 (latest)

ChatGPT as a Factual Inconsistency Evaluator for Text Summarization

27 March 2023

Zheheng Luo

Papers citing "ChatGPT as a Factual Inconsistency Evaluator for Text Summarization"

50 / 54 papers shown

OPOR-Bench: Evaluating Large Language Models on Online Public Opinion Report Generation

309

01 Dec 2025

Learning to Reason for Hallucination Span Detection

302

02 Oct 2025

SCI-Verifier: Scientific Verifier with Thinking

...

175

29 Sep 2025

Fine-Grained Detection of Context-Grounded Hallucinations Using LLMs

Yehonatan Peisakhovsky

187

26 Sep 2025

Principled Detection of Hallucinations in Large Language Models via Multiple Testing

Jiawei Li

A. Magesh

Venugopal V. Veeravalli

HILM

266

25 Aug 2025

Your Agent Can Defend Itself against Backdoor Attacks

417

10 Jun 2025

Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation

353

05 Jun 2025

RvLLM: LLM Runtime Verification with Domain Knowledge

460

24 May 2025

Long-Form Information Alignment Evaluation Beyond Atomic Facts

287

21 May 2025

Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards

471

07 May 2025

A Survey on Transformer Context Extension: Approaches and Evaluation

575

17 Mar 2025

GraphEval: A Lightweight Graph-Based LLM Framework for Idea EvaluationInternational Conference on Learning Representations (ICLR), 2025

Tao Feng

Yihang Sun

Jiaxuan You

534

16 Mar 2025

Hallucination Detection in Large Language Models with Metamorphic Relations

551

20 Feb 2025

SummExecEdit: A Factual Consistency Benchmark in Summarization with Executable Edits

426

17 Dec 2024

Do Automatic Factuality Metrics Measure Factuality? A Critical Evaluation

S. Ramprasad

Byron C. Wallace

LLMAG HILM

716

25 Nov 2024

Multi-hop Evidence Pursuit Meets the Web: Team Papelo at FEVER 2024

Christopher Malon

LRM

193

08 Nov 2024

FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Renyi Qu

...

284

17 Oct 2024

Cheating Automatic LLM Benchmarks: Null Models Achieve High Win RatesInternational Conference on Learning Representations (ICLR), 2024

Qian Liu

317

09 Oct 2024

From Facts to Insights: A Study on the Generation and Evaluation of Analytical Reports for Deciphering Earnings CallsInternational Conference on Computational Linguistics (COLING), 2024

Tomas Goldsack

Yang Wang

Chenghua Lin

Chung-Chi Chen

159

01 Oct 2024

T3: A Novel Zero-shot Transfer Learning Framework Iteratively Training on an Assistant Task for a Target TaskInternational Conference on Intelligent Computing (ICIC), 2024

535

26 Sep 2024

Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization?Annual Meeting of the Association for Computational Linguistics (ACL), 2024

Bhiksha Raj

268

12 Aug 2024

Crafting the Path: Robust Query Rewriting for Information Retrieval

247

17 Jul 2024

EVA-Score: Evaluating Abstractive Long-form Summarization on Informativeness through Extraction and Validation

234

06 Jul 2024

A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations

Md Tahmid Rahman Laskar

Sawsan Alqahtani

M Saiful Bari

Mizanur Rahman

Mohammad Abdullah Matin Khan

...

Enamul Hoque

Jimmy Huang

303

104

04 Jul 2024

Detecting Errors through Ensembling Prompts (DEEP): An End-to-End LLM Framework for Detecting Factual ErrorsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

189

18 Jun 2024

Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models

Qipeng Guo

...

Xipeng Qiu

XuanJing Huang

LRM

278

21 May 2024

Large Language Models are Inconsistent and Biased Evaluators

451

113

02 May 2024

FIZZ: Factual Inconsistency Detection by Zoom-in Summary and Zoom-out Document

360

17 Apr 2024

MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents

436

192

16 Apr 2024

Less is More for Improving Automatic Evaluation of Factual ConsistencyNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

195

09 Apr 2024

SIFiD: Reassess Summary Factual Inconsistency Detection with LLM

305

12 Mar 2024

German also Hallucinates! Inconsistency Detection in News Summaries with the Absinth Dataset

405

06 Mar 2024

A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods

820

200

05 Mar 2024

FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction

333

04 Mar 2024

Identifying Factual Inconsistencies in Summaries: Grounding Model Inference via Task Taxonomy

343

20 Feb 2024

FactPICO: Factuality Evaluation for Plain Language Summarization of Medical Evidence

Sebastian Antony Joseph

221

18 Feb 2024

Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2024

512

294

06 Feb 2024

Evaluating the Factuality of Zero-shot Summarizers Across Varied DomainsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2024

245

05 Feb 2024

Evaluating Large Language Models for Health-related Queries with PresuppositionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

279

14 Dec 2023

AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation

337

16 Nov 2023

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

...

555

2,311

09 Nov 2023

OpinSummEval: Revisiting Automated Evaluation for Opinion Summarization

Yuchen Shen

Xiaojun Wan

430

27 Oct 2023

On Context Utilization in Summarization with Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

653

16 Oct 2023

Towards Better Evaluation of Instruction-Following: A Case-Study in SummarizationConference on Computational Natural Language Learning (CoNLL), 2023

340

12 Oct 2023

Well Begun is Half Done: Generator-agnostic Knowledge Pre-Selection for Knowledge-Grounded DialogueConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Yao Zhang

281

11 Oct 2023

Benchmarking Cognitive Biases in Large Language Models as EvaluatorsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

396

145

29 Sep 2023

LongDocFACTScore: Evaluating the Factuality of Long Document Abstractive SummarisationInternational Conference on Language Resources and Evaluation (LREC), 2023

331

21 Sep 2023

Can Large Language Models Discern Evidence for Scientific Hypotheses? Case Studies in the Social SciencesInternational Conference on Language Resources and Evaluation (LREC), 2023

S. Koneru

Jian Wu

Sarah Rajtmajer

340

07 Sep 2023

Translate Meanings, Not Just Words: IdiomKB's Role in Optimizing Idiomatic Translation with Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2023

Xinyi Wu

311

26 Aug 2023

System-Level Natural Language FeedbackConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023

Weizhe Yuan

Dong Wang

Jason Weston

436

23 Jun 2023