ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.14251
  4. Cited By
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long
  Form Text Generation
v1v2 (latest)

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
23 May 2023
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Anuj Kumar
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
    HILMALM
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"

50 / 608 papers shown
Title
MUCH: A Multilingual Claim Hallucination Benchmark
MUCH: A Multilingual Claim Hallucination Benchmark
Jérémie Dentan
Alexi Canesse
Davide Buscaldi
A. Shabou
Sonia Vanier
HILM
118
0
0
21 Nov 2025
The Oracle and The Prism: A Decoupled and Efficient Framework for Generative Recommendation Explanation
The Oracle and The Prism: A Decoupled and Efficient Framework for Generative Recommendation Explanation
Jiaheng Zhang
Daqiang Zhang
173
0
0
20 Nov 2025
ConInstruct: Evaluating Large Language Models on Conflict Detection and Resolution in Instructions
ConInstruct: Evaluating Large Language Models on Conflict Detection and Resolution in Instructions
Xingwei He
Qianru Zhang
Pengfei Chen
Guanhua Chen
Linlin Yu
Yuan Yuan
Siu-Ming Yiu
137
0
0
18 Nov 2025
AA-Omniscience: Evaluating Cross-Domain Knowledge Reliability in Large Language Models
AA-Omniscience: Evaluating Cross-Domain Knowledge Reliability in Large Language Models
Declan Jackson
William Keating
George Cameron
Micah Hill-Smith
HILMRALMELM
484
0
0
17 Nov 2025
QA-Noun: Representing Nominal Semantics via Natural Language Question-Answer Pairs
QA-Noun: Representing Nominal Semantics via Natural Language Question-Answer Pairs
Maria Tseytlin
Paul Roit
Omri Abend
Ido Dagan
Ayal Klein
32
0
0
16 Nov 2025
Assessing Automated Fact-Checking for Medical LLM Responses with Knowledge Graphs
Assessing Automated Fact-Checking for Medical LLM Responses with Knowledge Graphs
Shasha Zhou
Mingyu Huang
Jack Cole
Charles Britton
Ming Yin
Jan Wolber
Ke Li
46
0
0
16 Nov 2025
Consistency Is the Key: Detecting Hallucinations in LLM Generated Text By Checking Inconsistencies About Key Facts
Consistency Is the Key: Detecting Hallucinations in LLM Generated Text By Checking Inconsistencies About Key Facts
Raavi Gupta
Pranav Hari Panicker
S. Bhatia
Ganesh Ramakrishnan
HILM
84
0
0
15 Nov 2025
Rethinking Retrieval-Augmented Generation for Medicine: A Large-Scale, Systematic Expert Evaluation and Practical Insights
Rethinking Retrieval-Augmented Generation for Medicine: A Large-Scale, Systematic Expert Evaluation and Practical Insights
Hyunjae Kim
Jiwoong Sohn
Aidan Gilson
Nicholas Cochran-Caggiano
Serina S Applebaum
...
James Zou
Andrew Taylor
Arman Cohan
Hua Xu
Qingyu Chen
RALMLM&MA
287
1
0
10 Nov 2025
Evaluation of retrieval-based QA on QUEST-LOFT
Evaluation of retrieval-based QA on QUEST-LOFT
Nathan Scales
Nathanael Scharli
Olivier Bousquet
RALM
312
0
0
08 Nov 2025
TSVer: A Benchmark for Fact Verification Against Time-Series Evidence
TSVer: A Benchmark for Fact Verification Against Time-Series Evidence
Marek Strong
Andreas Vlachos
AI4TS
92
0
0
02 Nov 2025
RCScore: Quantifying Response Consistency in Large Language Models
RCScore: Quantifying Response Consistency in Large Language Models
Dongjun Jang
Youngchae Ahn
Hyopil Shin
76
0
0
30 Oct 2025
VISTA Score: Verification In Sequential Turn-based Assessment
VISTA Score: Verification In Sequential Turn-based Assessment
A. Lewis
Andrew Perrault
Eric Fosler-Lussier
Michael White
HILM
260
0
0
30 Oct 2025
CLINB: A Climate Intelligence Benchmark for Foundational Models
CLINB: A Climate Intelligence Benchmark for Foundational Models
Michelle Chen Huebscher
Katharine Mach
Aleksandar Stanić
Markus Leippold
Ben Gaiarin
...
Massimiliano Ciaramita
Joeri Rogelj
Christian Buck
Lierni Sestorain Saralegui
Reto Knutti
HILMELM
229
0
0
29 Oct 2025
Evidence-Bound Autonomous Research (EviBound): A Governance Framework for Eliminating False Claims
Evidence-Bound Autonomous Research (EviBound): A Governance Framework for Eliminating False Claims
Ruiying Chen
40
0
0
28 Oct 2025
MAD-Fact: A Multi-Agent Debate Framework for Long-Form Factuality Evaluation in LLMs
MAD-Fact: A Multi-Agent Debate Framework for Long-Form Factuality Evaluation in LLMs
Yucheng Ning
Xixun Lin
Fang Fang
Yanan Cao
HILM
265
0
0
27 Oct 2025
SafetyPairs: Isolating Safety Critical Image Features with Counterfactual Image Generation
SafetyPairs: Isolating Safety Critical Image Features with Counterfactual Image Generation
Alec Helbling
Shruti Palaskar
Kundan Krishna
Polo Chau
Leon A Gatys
Joseph Y Cheng
EGVM
161
0
0
24 Oct 2025
JointCQ: Improving Factual Hallucination Detection with Joint Claim and Query Generation
JointCQ: Improving Factual Hallucination Detection with Joint Claim and Query Generation
F. Xu
Huixuan Zhang
Zhenliang Zhang
Jiahao Wang
Xiaojun Wan
HILM
156
0
0
22 Oct 2025
Fine-Tuned Thoughts: Leveraging Chain-of-Thought Reasoning for Industrial Asset Health Monitoring
Fine-Tuned Thoughts: Leveraging Chain-of-Thought Reasoning for Industrial Asset Health Monitoring
Shuxin Lin
Dhaval Patel
Christodoulos Constantinides
LRM
84
1
0
21 Oct 2025
Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations
Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations
Tong Chen
Akari Asai
Luke Zettlemoyer
Hannaneh Hajishirzi
Faeze Brahman
OffRLHILMLRM
157
0
0
20 Oct 2025
ESI: Epistemic Uncertainty Quantification via Semantic-preserving Intervention for Large Language Models
ESI: Epistemic Uncertainty Quantification via Semantic-preserving Intervention for Large Language Models
Mingda Li
Xinyu Li
Weinan Zhang
Longxuan Ma
92
0
0
15 Oct 2025
FaStfact: Faster, Stronger Long-Form Factuality Evaluations in LLMs
FaStfact: Faster, Stronger Long-Form Factuality Evaluations in LLMsConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Yingjia Wan
Haochen Tan
Xiao Zhu
Xinyu Zhou
Z. Li
...
Jiaqi Zeng
Yi Xu
Jianqiao Lu
Yinhong Liu
Zhijiang Guo
HILMOffRL
430
0
0
13 Oct 2025
The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-Form Answers
The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-Form Answers
Saad Obaid ul Islam
Anne Lauscher
Goran Glavaš
HILM
150
0
0
13 Oct 2025
Inflated Excellence or True Performance? Rethinking Medical Diagnostic Benchmarks with Dynamic Evaluation
Inflated Excellence or True Performance? Rethinking Medical Diagnostic Benchmarks with Dynamic Evaluation
Xiangxu Zhang
Lei Li
Yanyun Zhou
Xiao Zhou
Y. Zhang
Xian Wu
LM&MAELM
162
0
0
10 Oct 2025
Large Language Models Do NOT Really Know What They Don't Know
Large Language Models Do NOT Really Know What They Don't Know
C. Cheang
Hou Pong Chan
Wenxuan Zhang
Yang Deng
HILM
140
0
0
10 Oct 2025
Automated Refinement of Essay Scoring Rubrics for Language Models via Reflect-and-Revise
Automated Refinement of Essay Scoring Rubrics for Language Models via Reflect-and-Revise
Keno Harada
Lui Yoshida
Takeshi Kojima
Yusuke Iwasawa
Yutaka Matsuo
86
0
0
10 Oct 2025
PrismGS: Physically-Grounded Anti-Aliasing for High-Fidelity Large-Scale 3D Gaussian Splatting
PrismGS: Physically-Grounded Anti-Aliasing for High-Fidelity Large-Scale 3D Gaussian Splatting
Houqiang Zhong
Zhenglong Wu
Sihua Fu
Zihan Zheng
Xin Jin
X. Zhang
Li Song
Q. Hu
3DGS
100
4
0
09 Oct 2025
Comprehensiveness Metrics for Automatic Evaluation of Factual Recall in Text Generation
Comprehensiveness Metrics for Automatic Evaluation of Factual Recall in Text Generation
Adam Dejl
James Barry
Alessandra Pascale
Javier Carnerero-Cano
HILMELM
96
0
0
09 Oct 2025
LeMAJ (Legal LLM-as-a-Judge): Bridging Legal Reasoning and LLM Evaluation
LeMAJ (Legal LLM-as-a-Judge): Bridging Legal Reasoning and LLM Evaluation
Joseph Enguehard
Morgane Van Ermengem
Kate Atkinson
Sujeong Cha
Arijit Ghosh Chowdhury
...
Jeremy Roghair
Hannah R Marlowe
Carina Suzana Negreanu
Kitty Boxall
Diana Mincu
AILawELM
144
0
0
08 Oct 2025
Distributional Semantics Tracing: A Framework for Explaining Hallucinations in Large Language Models
Distributional Semantics Tracing: A Framework for Explaining Hallucinations in Large Language Models
Gagan Bhatia
Somayajulu G Sripada
Kevin Allan
Jacobo Azcona
HILMLRM
242
0
0
07 Oct 2025
When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA
When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA
Elisei Rykov
Kseniia Petrushina
Maksim Savkin
Valerii Olisov
Artem Vazhentsev
Kseniia Titova
Ilseyar Alimova
Vasily Konovalov
Julia Belikova
HILM
137
2
0
06 Oct 2025
The Geometry of Truth: Layer-wise Semantic Dynamics for Hallucination Detection in Large Language Models
The Geometry of Truth: Layer-wise Semantic Dynamics for Hallucination Detection in Large Language Models
Amir Hameed Mir
HILM
123
0
0
06 Oct 2025
Reward Models are Metrics in a Trench Coat
Reward Models are Metrics in a Trench Coat
Sebastian Gehrmann
112
0
0
03 Oct 2025
Sample, Align, Synthesize: Graph-Based Response Synthesis with ConGrs
Sample, Align, Synthesize: Graph-Based Response Synthesis with ConGrs
Sayan Ghosh
Shahzaib Saqib Warraich
Dhruv Tarsadiya
Gregory Yauney
Swabha Swayamdipta
100
0
0
03 Oct 2025
Knowledge-Graph Based RAG System Evaluation Framework
Knowledge-Graph Based RAG System Evaluation Framework
Sicheng Dong
Vahid Zolfaghari
Nenad Petrovic
Alois C. Knoll
114
0
0
02 Oct 2025
KnowGuard: Knowledge-Driven Abstention for Multi-Round Clinical Reasoning
KnowGuard: Knowledge-Driven Abstention for Multi-Round Clinical Reasoning
Xilin Dang
Kexin Chen
Xiaorui Su
Ayush Noori
Inaki Arango
Lucas Vittor
Xinyi Long
Yuyang Du
Marinka Zitnik
Pheng-Ann Heng
72
1
0
29 Sep 2025
Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality
Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality
Junliang Li
Yucheng Wang
Yan Chen
Yu Ran
Ruiqing Zhang
Jing Liu
H. Wu
Haifeng Wang
OffRLHILM
105
0
0
28 Sep 2025
EduVidQA: Generating and Evaluating Long-form Answers to Student Questions based on Lecture Videos
EduVidQA: Generating and Evaluating Long-form Answers to Student Questions based on Lecture Videos
Sourjyadip Ray
Shubham Sharma
Somak Aditya
Pawan Goyal
AI4Ed
172
0
0
28 Sep 2025
Detecting Corpus-Level Knowledge Inconsistencies in Wikipedia with Large Language Models
Detecting Corpus-Level Knowledge Inconsistencies in Wikipedia with Large Language Models
Sina J. Semnani
Jirayu Burapacheep
Arpandeep Khatua
Thanawan Atchariyachanvanit
Zheng Wang
M. Lam
KELM
100
0
0
27 Sep 2025
Fine-Grained Detection of Context-Grounded Hallucinations Using LLMs
Fine-Grained Detection of Context-Grounded Hallucinations Using LLMs
Yehonatan Peisakhovsky
Zorik Gekhman
Y. Mass
Liat Ein-Dor
Roi Reichart
HILM
103
1
0
26 Sep 2025
Concise and Sufficient Sub-Sentence Citations for Retrieval-Augmented Generation
Concise and Sufficient Sub-Sentence Citations for Retrieval-Augmented Generation
Guo Chen
Qiuyuan Li
Qiuxian Li
Hongliang Dai
Xiang Chen
Piji Li
3DVHILM
97
0
0
25 Sep 2025
Comparative Personalization for Multi-document Summarization
Comparative Personalization for Multi-document Summarization
Haoyuan Li
Snigdha Chaturvedi
64
0
0
25 Sep 2025
Extractive Fact Decomposition for Interpretable Natural Language Inference in one Forward Pass
Extractive Fact Decomposition for Interpretable Natural Language Inference in one Forward Pass
Nicholas Popovic
Michael Färber
66
0
0
23 Sep 2025
OraPO: Oracle-educated Reinforcement Learning for Data-efficient and Factual Radiology Report Generation
OraPO: Oracle-educated Reinforcement Learning for Data-efficient and Factual Radiology Report Generation
Zhuoxiao Chen
Hongyang Yu
Ying Xu
Yadan Luo
Long Duong
Yuan-Fang Li
OffRLMedIm
116
0
0
23 Sep 2025
A Good Plan is Hard to Find: Aligning Models with Preferences is Misaligned with What Helps Users
A Good Plan is Hard to Find: Aligning Models with Preferences is Misaligned with What Helps Users
Nishant Balepur
Matthew Shu
Yoo Yeon Sung
Seraphina Goldfarb-Tarrant
Shi Feng
Fumeng Yang
Rachel Rudinger
Jordan L. Boyd-Graber
170
0
0
23 Sep 2025
LLM-based Agents Suffer from Hallucinations: A Survey of Taxonomy, Methods, and Directions
LLM-based Agents Suffer from Hallucinations: A Survey of Taxonomy, Methods, and Directions
Xixun Lin
Yucheng Ning
Jingwen Zhang
Yan Dong
Y. Liu
...
Bin Wang
Yanan Cao
Kai-xiang Chen
Songlin Hu
Li Guo
LLMAGLRM
274
4
0
23 Sep 2025
Memory in Large Language Models: Mechanisms, Evaluation and Evolution
Memory in Large Language Models: Mechanisms, Evaluation and Evolution
D. Zhang
Wendong Li
Kani Song
Jiaye Lu
Gang Li
Liuchun Yang
Sheng Li
KELM
173
1
0
23 Sep 2025
WildClaims: Information Access Conversations in the Wild(Chat)
WildClaims: Information Access Conversations in the Wild(Chat)
Hideaki Joko
Shakiba Amirshahi
Charles L. A. Clarke
Faegheh Hasibi
92
0
0
22 Sep 2025
KAHAN: Knowledge-Augmented Hierarchical Analysis and Narration for Financial Data Narration
KAHAN: Knowledge-Augmented Hierarchical Analysis and Narration for Financial Data Narration
Yajing Yang
Tony Deng
Min-Yen Kan
76
0
0
21 Sep 2025
InteGround: On the Evaluation of Verification and Retrieval Planning in Integrative Grounding
InteGround: On the Evaluation of Verification and Retrieval Planning in Integrative Grounding
Cheng Jiayang
Qianqian Zhuang
Haoran Li
Chunkit Chan
Xin Liu
Lin Qiu
Yangqiu Song
HILM
121
0
0
20 Sep 2025
A Novel Differential Feature Learning for Effective Hallucination Detection and Classification
A Novel Differential Feature Learning for Effective Hallucination Detection and Classification
Wenkai Wang
Vincent C. S. Lee
Yizhen Zheng
72
0
0
20 Sep 2025
1234...111213
Next