v1v2 (latest)

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

23 May 2023

Pang Wei Koh

Luke Zettlemoyer

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Papers citing "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"

50 / 615 papers shown

AlignCheck: a Semantic Open-Domain Metric for Factual Consistency Assessment

Ahmad Aghaebrahimian

HILM

156

03 Dec 2025

Towards Unification of Hallucination Detection and Fact Verification for Large Language Models

119

02 Dec 2025

Detecting AI Hallucinations in Finance: An Information-Theoretic Method Cuts Hallucination Rate by 92%

Mainak Singha

HILM

274

02 Dec 2025

BHRAM-IL: A Benchmark for Hallucination Recognition and Assessment in Multiple Indian Languages

141

01 Dec 2025

TrackList: Tracing Back Query Linguistic Diversity for Head and Tail Knowledge in Open Large Language Models

Ioana Buhnila

Aman Sinha

Mathieu Constant

242

26 Nov 2025

MUCH: A Multilingual Claim Hallucination Benchmark

220

21 Nov 2025

Beyond Component Strength: Synergistic Integration and Adaptive Calibration in Multi-Agent RAG Systems

Jithin Krishnan

21 Nov 2025

The Oracle and The Prism: A Decoupled and Efficient Framework for Generative Recommendation Explanation

Jiaheng Zhang

Daqiang Zhang

239

20 Nov 2025

ConInstruct: Evaluating Large Language Models on Conflict Detection and Resolution in Instructions

217

18 Nov 2025

AA-Omniscience: Evaluating Cross-Domain Knowledge Reliability in Large Language Models

737

17 Nov 2025

Assessing Automated Fact-Checking for Medical LLM Responses with Knowledge Graphs

16 Nov 2025

QA-Noun: Representing Nominal Semantics via Natural Language Question-Answer Pairs

16 Nov 2025

Consistency Is the Key: Detecting Hallucinations in LLM Generated Text By Checking Inconsistencies About Key Facts

142

15 Nov 2025

Rethinking Retrieval-Augmented Generation for Medicine: A Large-Scale, Systematic Expert Evaluation and Practical Insights

Hyunjae Kim

Jiwoong Sohn

Aidan Gilson

Nicholas Cochran-Caggiano

...

344

10 Nov 2025

Evaluation of retrieval-based QA on QUEST-LOFT

376

08 Nov 2025

TSVer: A Benchmark for Fact Verification Against Time-Series Evidence

Marek Strong

Andreas Vlachos

AI4TS

144

02 Nov 2025

VISTA Score: Verification In Sequential Turn-based Assessment

290

30 Oct 2025

RCScore: Quantifying Response Consistency in Large Language Models

Dongjun Jang

Youngchae Ahn

Hyopil Shin

140

30 Oct 2025

CLINB: A Climate Intelligence Benchmark for Foundational Models

Michelle Chen Huebscher

...

Massimiliano Ciaramita

Joeri Rogelj

Christian Buck

Lierni Sestorain Saralegui

Reto Knutti

HILM ELM

313

29 Oct 2025

Evidence-Bound Autonomous Research (EviBound): A Governance Framework for Eliminating False Claims

Ruiying Chen

28 Oct 2025

MAD-Fact: A Multi-Agent Debate Framework for Long-Form Factuality Evaluation in LLMs

313

27 Oct 2025

SafetyPairs: Isolating Safety Critical Image Features with Counterfactual Image Generation

198

24 Oct 2025

JointCQ: Improving Factual Hallucination Detection with Joint Claim and Query Generation

200

22 Oct 2025

Fine-Tuned Thoughts: Leveraging Chain-of-Thought Reasoning for Industrial Asset Health Monitoring

Shuxin Lin

Dhaval Patel

Christodoulos Constantinides

LRM

105

21 Oct 2025

Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations

189

20 Oct 2025

ESI: Epistemic Uncertainty Quantification via Semantic-preserving Intervention for Large Language Models

139

15 Oct 2025

The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-Form Answers

212

13 Oct 2025

FaStfact: Faster, Stronger Long-Form Factuality Evaluations in LLMsConference on Empirical Methods in Natural Language Processing (EMNLP), 2025

...

564

13 Oct 2025

Inflated Excellence or True Performance? Rethinking Medical Diagnostic Benchmarks with Dynamic Evaluation

185

10 Oct 2025

Large Language Models Do NOT Really Know What They Don't Know

154

10 Oct 2025

Automated Refinement of Essay Scoring Rubrics for Language Models via Reflect-and-Revise

109

10 Oct 2025

Comprehensiveness Metrics for Automatic Evaluation of Factual Recall in Text Generation

Adam Dejl

James Barry

Alessandra Pascale

Javier Carnerero-Cano

HILM ELM

122

09 Oct 2025

PrismGS: Physically-Grounded Anti-Aliasing for High-Fidelity Large-Scale 3D Gaussian Splatting

117

09 Oct 2025

LeMAJ (Legal LLM-as-a-Judge): Bridging Legal Reasoning and LLM Evaluation

Arijit Ghosh Chowdhury

...

Jeremy Roghair

Hannah R Marlowe

Carina Suzana Negreanu

Kitty Boxall

Diana Mincu

AILaw ELM

164

08 Oct 2025

Distributional Semantics Tracing: A Framework for Explaining Hallucinations in Large Language Models

274

07 Oct 2025

When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA

191

06 Oct 2025

The Geometry of Truth: Layer-wise Semantic Dynamics for Hallucination Detection in Large Language Models

Amir Hameed Mir

HILM

149

06 Oct 2025

Sample, Align, Synthesize: Graph-Based Response Synthesis with ConGrs

Sayan Ghosh

Shahzaib Saqib Warraich

Dhruv Tarsadiya

Gregory Yauney

Swabha Swayamdipta

117

03 Oct 2025

Reward Models are Metrics in a Trench Coat

Sebastian Gehrmann

144

03 Oct 2025

Knowledge-Graph Based RAG System Evaluation Framework

145

02 Oct 2025

TraceDet: Hallucination Detection from the Decoding Trace of Diffusion Large Language Models

153

30 Sep 2025

KnowGuard: Knowledge-Driven Abstention for Multi-Round Clinical Reasoning

116

29 Sep 2025

Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality

143

28 Sep 2025

EduVidQA: Generating and Evaluating Long-form Answers to Student Questions based on Lecture Videos

223

28 Sep 2025

Detecting Corpus-Level Knowledge Inconsistencies in Wikipedia with Large Language Models

Sina J. Semnani

Jirayu Burapacheep

Arpandeep Khatua

Thanawan Atchariyachanvanit

Zheng Wang

M. Lam

KELM

124

27 Sep 2025

Fine-Grained Detection of Context-Grounded Hallucinations Using LLMs

Yehonatan Peisakhovsky

159

26 Sep 2025

Comparative Personalization for Multi-document Summarization

Haoyuan Li

Snigdha Chaturvedi

108

25 Sep 2025

Concise and Sufficient Sub-Sentence Citations for Retrieval-Augmented Generation

162

25 Sep 2025

Memory in Large Language Models: Mechanisms, Evaluation and Evolution

213

23 Sep 2025

LLM-based Agents Suffer from Hallucinations: A Survey of Taxonomy, Methods, and Directions

...

341

23 Sep 2025