Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2305.14251
Cited By
v1
v2 (latest)
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
23 May 2023
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Anuj Kumar
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"
50 / 615 papers shown
AlignCheck: a Semantic Open-Domain Metric for Factual Consistency Assessment
Ahmad Aghaebrahimian
HILM
156
0
0
03 Dec 2025
Towards Unification of Hallucination Detection and Fact Verification for Large Language Models
Weihang Su
Jianming Long
Changyue Wang
Shiyu Lin
Jingyan Xu
Ziyi Ye
Qingyao Ai
Yiqun Liu
HILM
112
0
0
02 Dec 2025
Detecting AI Hallucinations in Finance: An Information-Theoretic Method Cuts Hallucination Rate by 92%
Mainak Singha
HILM
268
0
0
02 Dec 2025
BHRAM-IL: A Benchmark for Hallucination Recognition and Assessment in Multiple Indian Languages
Hrishikesh Terdalkar
Kirtan Bhojani
Aryan Dongare
Omm Aditya Behera
HILM
VLM
141
0
0
01 Dec 2025
TrackList: Tracing Back Query Linguistic Diversity for Head and Tail Knowledge in Open Large Language Models
Ioana Buhnila
Aman Sinha
Mathieu Constant
231
0
0
26 Nov 2025
MUCH: A Multilingual Claim Hallucination Benchmark
Jérémie Dentan
Alexi Canesse
Davide Buscaldi
A. Shabou
Sonia Vanier
HILM
215
0
0
21 Nov 2025
Beyond Component Strength: Synergistic Integration and Adaptive Calibration in Multi-Agent RAG Systems
Jithin Krishnan
60
0
0
21 Nov 2025
The Oracle and The Prism: A Decoupled and Efficient Framework for Generative Recommendation Explanation
Jiaheng Zhang
Daqiang Zhang
233
0
0
20 Nov 2025
ConInstruct: Evaluating Large Language Models on Conflict Detection and Resolution in Instructions
Xingwei He
Qianru Zhang
Pengfei Chen
Guanhua Chen
Linlin Yu
Yuan Yuan
Siu-Ming Yiu
217
0
0
18 Nov 2025
AA-Omniscience: Evaluating Cross-Domain Knowledge Reliability in Large Language Models
Declan Jackson
William Keating
George Cameron
Micah Hill-Smith
HILM
RALM
ELM
735
0
0
17 Nov 2025
Assessing Automated Fact-Checking for Medical LLM Responses with Knowledge Graphs
Shasha Zhou
Mingyu Huang
Jack Cole
Charles Britton
Ming Yin
Jan Wolber
Ke Li
86
1
0
16 Nov 2025
QA-Noun: Representing Nominal Semantics via Natural Language Question-Answer Pairs
Maria Tseytlin
Paul Roit
Omri Abend
Ido Dagan
Ayal Klein
52
0
0
16 Nov 2025
Consistency Is the Key: Detecting Hallucinations in LLM Generated Text By Checking Inconsistencies About Key Facts
Raavi Gupta
Pranav Hari Panicker
S. Bhatia
Ganesh Ramakrishnan
HILM
136
2
0
15 Nov 2025
Rethinking Retrieval-Augmented Generation for Medicine: A Large-Scale, Systematic Expert Evaluation and Practical Insights
Hyunjae Kim
Jiwoong Sohn
Aidan Gilson
Nicholas Cochran-Caggiano
Serina S Applebaum
...
James Zou
Andrew Taylor
Arman Cohan
Hua Xu
Qingyu Chen
RALM
LM&MA
335
3
0
10 Nov 2025
Evaluation of retrieval-based QA on QUEST-LOFT
Nathan Scales
Nathanael Scharli
Olivier Bousquet
RALM
376
0
0
08 Nov 2025
TSVer: A Benchmark for Fact Verification Against Time-Series Evidence
Marek Strong
Andreas Vlachos
AI4TS
144
0
0
02 Nov 2025
VISTA Score: Verification In Sequential Turn-based Assessment
A. Lewis
Andrew Perrault
Eric Fosler-Lussier
Michael White
HILM
284
0
0
30 Oct 2025
RCScore: Quantifying Response Consistency in Large Language Models
Dongjun Jang
Youngchae Ahn
Hyopil Shin
132
0
0
30 Oct 2025
CLINB: A Climate Intelligence Benchmark for Foundational Models
Michelle Chen Huebscher
Katharine Mach
Aleksandar Stanić
Markus Leippold
Ben Gaiarin
...
Massimiliano Ciaramita
Joeri Rogelj
Christian Buck
Lierni Sestorain Saralegui
Reto Knutti
HILM
ELM
311
0
0
29 Oct 2025
Evidence-Bound Autonomous Research (EviBound): A Governance Framework for Eliminating False Claims
Ruiying Chen
96
0
0
28 Oct 2025
MAD-Fact: A Multi-Agent Debate Framework for Long-Form Factuality Evaluation in LLMs
Yucheng Ning
Xixun Lin
Fang Fang
Yanan Cao
HILM
305
0
0
27 Oct 2025
SafetyPairs: Isolating Safety Critical Image Features with Counterfactual Image Generation
Alec Helbling
Shruti Palaskar
Kundan Krishna
Polo Chau
Leon A Gatys
Joseph Y Cheng
EGVM
194
1
0
24 Oct 2025
JointCQ: Improving Factual Hallucination Detection with Joint Claim and Query Generation
F. Xu
Huixuan Zhang
Zhenliang Zhang
Jiahao Wang
Xiaojun Wan
HILM
196
0
0
22 Oct 2025
Fine-Tuned Thoughts: Leveraging Chain-of-Thought Reasoning for Industrial Asset Health Monitoring
Shuxin Lin
Dhaval Patel
Christodoulos Constantinides
LRM
104
1
0
21 Oct 2025
Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations
Tong Chen
Akari Asai
Luke Zettlemoyer
Hannaneh Hajishirzi
Faeze Brahman
OffRL
HILM
LRM
189
0
0
20 Oct 2025
ESI: Epistemic Uncertainty Quantification via Semantic-preserving Intervention for Large Language Models
Mingda Li
Xinyu Li
Weinan Zhang
Longxuan Ma
136
0
0
15 Oct 2025
The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-Form Answers
Saad Obaid ul Islam
Anne Lauscher
Goran Glavaš
HILM
212
0
0
13 Oct 2025
FaStfact: Faster, Stronger Long-Form Factuality Evaluations in LLMs
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Yingjia Wan
Haochen Tan
Xiao Zhu
Xinyu Zhou
Z. Li
...
Jiaqi Zeng
Yi Xu
Jianqiao Lu
Yinhong Liu
Zhijiang Guo
HILM
OffRL
559
0
0
13 Oct 2025
Inflated Excellence or True Performance? Rethinking Medical Diagnostic Benchmarks with Dynamic Evaluation
Xiangxu Zhang
Lei Li
Yanyun Zhou
Xiao Zhou
Y. Zhang
Xian Wu
LM&MA
ELM
181
0
0
10 Oct 2025
Large Language Models Do NOT Really Know What They Don't Know
C. Cheang
Hou Pong Chan
Wenxuan Zhang
Yang Deng
HILM
153
0
0
10 Oct 2025
Automated Refinement of Essay Scoring Rubrics for Language Models via Reflect-and-Revise
Keno Harada
Lui Yoshida
Takeshi Kojima
Yusuke Iwasawa
Yutaka Matsuo
106
0
0
10 Oct 2025
Comprehensiveness Metrics for Automatic Evaluation of Factual Recall in Text Generation
Adam Dejl
James Barry
Alessandra Pascale
Javier Carnerero-Cano
HILM
ELM
120
0
0
09 Oct 2025
PrismGS: Physically-Grounded Anti-Aliasing for High-Fidelity Large-Scale 3D Gaussian Splatting
Houqiang Zhong
Zhenglong Wu
Sihua Fu
Zihan Zheng
Xin Jin
X. Zhang
Li Song
Q. Hu
3DGS
112
5
0
09 Oct 2025
LeMAJ (Legal LLM-as-a-Judge): Bridging Legal Reasoning and LLM Evaluation
Joseph Enguehard
Morgane Van Ermengem
Kate Atkinson
Sujeong Cha
Arijit Ghosh Chowdhury
...
Jeremy Roghair
Hannah R Marlowe
Carina Suzana Negreanu
Kitty Boxall
Diana Mincu
AILaw
ELM
160
0
0
08 Oct 2025
Distributional Semantics Tracing: A Framework for Explaining Hallucinations in Large Language Models
Gagan Bhatia
Somayajulu G Sripada
Kevin Allan
Jacobo Azcona
HILM
LRM
273
1
0
07 Oct 2025
When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA
Elisei Rykov
Kseniia Petrushina
Maksim Savkin
Valerii Olisov
Artem Vazhentsev
Kseniia Titova
Ilseyar Alimova
Vasily Konovalov
Julia Belikova
HILM
181
2
0
06 Oct 2025
The Geometry of Truth: Layer-wise Semantic Dynamics for Hallucination Detection in Large Language Models
Amir Hameed Mir
HILM
148
0
0
06 Oct 2025
Sample, Align, Synthesize: Graph-Based Response Synthesis with ConGrs
Sayan Ghosh
Shahzaib Saqib Warraich
Dhruv Tarsadiya
Gregory Yauney
Swabha Swayamdipta
116
0
0
03 Oct 2025
Reward Models are Metrics in a Trench Coat
Sebastian Gehrmann
144
0
0
03 Oct 2025
Knowledge-Graph Based RAG System Evaluation Framework
Sicheng Dong
Vahid Zolfaghari
Nenad Petrovic
Alois C. Knoll
139
0
0
02 Oct 2025
TraceDet: Hallucination Detection from the Decoding Trace of Diffusion Large Language Models
Shenxu Chang
Junchi Yu
Weixing Wang
Yongqiang Chen
Jialin Yu
Philip Torr
Jindong Gu
HILM
153
0
0
30 Sep 2025
KnowGuard: Knowledge-Driven Abstention for Multi-Round Clinical Reasoning
Xilin Dang
Kexin Chen
Xiaorui Su
Ayush Noori
Inaki Arango
Lucas Vittor
Xinyi Long
Yuyang Du
Marinka Zitnik
Pheng-Ann Heng
116
1
0
29 Sep 2025
Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality
Junliang Li
Yucheng Wang
Yan Chen
Yu Ran
Ruiqing Zhang
Jing Liu
H. Wu
Haifeng Wang
OffRL
HILM
137
0
0
28 Sep 2025
EduVidQA: Generating and Evaluating Long-form Answers to Student Questions based on Lecture Videos
Sourjyadip Ray
Shubham Sharma
Somak Aditya
Pawan Goyal
AI4Ed
220
0
0
28 Sep 2025
Detecting Corpus-Level Knowledge Inconsistencies in Wikipedia with Large Language Models
Sina J. Semnani
Jirayu Burapacheep
Arpandeep Khatua
Thanawan Atchariyachanvanit
Zheng Wang
M. Lam
KELM
124
1
0
27 Sep 2025
Fine-Grained Detection of Context-Grounded Hallucinations Using LLMs
Yehonatan Peisakhovsky
Zorik Gekhman
Y. Mass
Liat Ein-Dor
Roi Reichart
HILM
152
1
0
26 Sep 2025
Comparative Personalization for Multi-document Summarization
Haoyuan Li
Snigdha Chaturvedi
108
0
0
25 Sep 2025
Concise and Sufficient Sub-Sentence Citations for Retrieval-Augmented Generation
Guo Chen
Qiuyuan Li
Qiuxian Li
Hongliang Dai
Xiang Chen
Piji Li
3DV
HILM
157
0
0
25 Sep 2025
Memory in Large Language Models: Mechanisms, Evaluation and Evolution
D. Zhang
Wendong Li
Kani Song
Jiaye Lu
Gang Li
Liuchun Yang
Sheng Li
KELM
208
1
0
23 Sep 2025
LLM-based Agents Suffer from Hallucinations: A Survey of Taxonomy, Methods, and Directions
Xixun Lin
Yucheng Ning
Jingwen Zhang
Yan Dong
Y. Liu
...
Bin Wang
Yanan Cao
Kai-xiang Chen
Songlin Hu
Li Guo
LLMAG
LRM
335
4
0
23 Sep 2025
1
2
3
4
...
11
12
13
Next