v1v2 (latest)

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

23 May 2023

Pang Wei Koh

Luke Zettlemoyer

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Papers citing "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"

50 / 615 papers shown

HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation

Wei Li

298

11 Jun 2024

Post-Hoc Answer Attribution for Grounded and Trustworthy Long Document Comprehension: Task, Insights, and Challenges

Abhilasha Sancheti

Koustava Goswami

Balaji Vasan Srinivasan

RALM

266

11 Jun 2024

A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation

Bairu Hou

Yang Zhang

Jacob Andreas

Shiyu Chang

299

11 Jun 2024

Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Hannaneh Hajishirzi

253

10 Jun 2024

Verifiable Generation with Subsentence-Level Fine-Grained CitationsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Shuyang Cao

Lu Wang

300

10 Jun 2024

Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation

265

08 Jun 2024

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the WildInternational Conference on Learning Representations (ICLR), 2024

Bill Yuchen Lin

Yuntian Deng

Khyathi Chandu

Faeze Brahman

Abhilasha Ravichander

Valentina Pyatkin

Nouha Dziri

Ronan Le Bras

Yejin Choi

268

139

07 Jun 2024

MAIRA-2: Grounded Radiology Report Generation

Shruthi Bannur

Kenza Bouzid

Daniel Coelho De Castro

...

Maria T. A. Wetscherek

Javier Alvarez-Valle

Stephanie L. Hyland

220

102

06 Jun 2024

PaCE: Parsimonious Concept Engineering for Large Language ModelsNeural Information Processing Systems (NeurIPS), 2024

260

06 Jun 2024

AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways

Sheng Wen

392

125

04 Jun 2024

Safeguarding Large Language Models: A Survey

...

254

03 Jun 2024

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs

381

150

03 Jun 2024

CtrlA: Adaptive Retrieval-Augmented Generation via Probe-Guided Control

268

29 May 2024

Nearest Neighbor Speculative Decoding for LLM Generation and Attribution

707

29 May 2024

TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models

246

28 May 2024

GRAG: Graph Retrieval-Augmented Generation

510

26 May 2024

Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

...

247

25 May 2024

Certifiably Robust RAG against Retrieval Corruption

305

24 May 2024

AGRaME: Any-Granularity Ranking with Multi-Vector Embeddings

Heng Ji

146

23 May 2024

RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models

Qipeng Guo

Yue Zhang

267

23 May 2024

Can LLMs Solve longer Math Word Problems Better?International Conference on Learning Representations (ICLR), 2024

540

23 May 2024

CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models

Guangzhi Sun

Potsawee Manakul

226

22 May 2024

Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation

Gauthier Guinet

Behrooz Omidvar-Tehrani

Hao Ding

Laurent Callot

RALM

277

22 May 2024

Atomic Self-Consistency for Better Long Form Generations

Raghuveer Thirukovalluru

Yukun Huang

Bhuwan Dhingra

217

21 May 2024

OLAPH: Improving Factuality in Biomedical Long-form Question Answering

435

21 May 2024

Large Language Models Meet NLP: A Survey

455

119

21 May 2024

Question-Based Retrieval using Atomic Units for Enterprise RAG

Vatsal Raina

Mark Gales

132

20 May 2024

SciQAG: A Framework for Auto-Generated Science Question Answering Dataset with Fine-grained Evaluation

224

16 May 2024

LLMs can learn self-restraint through iterative self-reflection

347

15 May 2024

Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024

405

225

09 May 2024

One vs. Many: Comprehending Accurate Information from Multiple Erroneous and Inconsistent AI GenerationsConference on Fairness, Accountability and Transparency (FAccT), 2024

John Joon Young Chung

Eytan Adar

Juho Kim

230

09 May 2024

OpenFactCheck: Building, Benchmarking Customized Fact-Checking Systems and Evaluating the Factuality of Claims and LLMs

415

09 May 2024

Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training

212

06 May 2024

Recall Them All: Retrieval-Augmented Language Models for Long Object List Extraction from Long Documents

418

04 May 2024

FLAME: Factuality-Aware Alignment for Large Language ModelsNeural Information Processing Systems (NeurIPS), 2024

191

02 May 2024

On the Evaluation of Machine-Generated Reports

...

332

02 May 2024

GRAMMAR: Grounded and Modular Methodology for Assessment of Closed-Domain Retrieval-Augmented Language Model

399

30 Apr 2024

From Matching to Generation: A Survey on Generative Information Retrieval

Xiaoxi Li

Jiajie Jin

Peitian Zhang

548

132

23 Apr 2024

ISQA: Informative Factuality Feedback for Scientific Summarization

240

20 Apr 2024

AmbigDocs: Reasoning across Documents on Different Entities under the Same Name

Yoonsang Lee

Xi Ye

Eunsol Choi

377

18 Apr 2024

Unifying Bias and Unfairness in Information Retrieval: A Survey of Challenges and Opportunities with Large Language Models

Liang Pang

Jun Xu

303

17 Apr 2024

FIZZ: Factual Inconsistency Detection by Zoom-in Summary and Zoom-out Document

317

17 Apr 2024

MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents

327

171

16 Apr 2024

NoticIA: A Clickbait Article Summarization Dataset in Spanish

Iker García-Ferrero

Begoña Altuna

324

11 Apr 2024

Best Practices and Lessons Learned on Synthetic Data for Language Models

Ruibo Liu

...

Diyi Yang

303

112

11 Apr 2024

Pitfalls of Conversational LLMs on News Debiasing

Ipek Baris Schlicht

Defne Altiok

Maryanne Taouk

Lucie Flek

237

09 Apr 2024

Characterizing Multimodal Long-form Summarization: A Case Study on Financial Reports

147

09 Apr 2024

Know When To Stop: A Study of Semantic Drift in Text Generation

236

08 Apr 2024

FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback

Liqiang Jing

Xinya Du

382

07 Apr 2024

PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics

Zongxiong Chen

203

06 Apr 2024