ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.14251
  4. Cited By
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long
  Form Text Generation
v1v2 (latest)

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
23 May 2023
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Anuj Kumar
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
    HILMALM
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"

50 / 608 papers shown
Title
SAFE: Improving LLM Systems using Sentence-Level In-generation Attribution
SAFE: Improving LLM Systems using Sentence-Level In-generation Attribution
João Eduardo Batista
Emil Vatai
Mohamed Wahib
370
0
0
19 May 2025
What Are They Talking About? A Benchmark of Knowledge-Grounded Discussion Summarization
What Are They Talking About? A Benchmark of Knowledge-Grounded Discussion Summarization
Weixiao Zhou
Junnan Zhu
Gengyao Li
Xianfu Cheng
Xinnian Liang
Feifei Zhai
Zhiyu Li
ALM
301
0
0
18 May 2025
Learning Auxiliary Tasks Improves Reference-Free Hallucination Detection in Open-Domain Long-Form Generation
Learning Auxiliary Tasks Improves Reference-Free Hallucination Detection in Open-Domain Long-Form GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Chengwei Qin
Wenxuan Zhou
Karthik Abinav Sankararaman
Nanshu Wang
Tengyu Xu
...
Aditya Tayade
Sinong Wang
Shafiq Joty
Han Fang
Hao Ma
HILMLRM
210
0
0
18 May 2025
Latent Veracity Inference for Identifying Errors in Stepwise Reasoning
Latent Veracity Inference for Identifying Errors in Stepwise Reasoning
Minsu Kim
Jean-Pierre Falet
Oliver E. Richardson
Xiaoyin Chen
Moksh Jain
Sungjin Ahn
Sungsoo Ahn
Yoshua Bengio
KELMReLMLRM
312
1
0
17 May 2025
THELMA: Task Based Holistic Evaluation of Large Language Model Applications-RAG Question Answering
THELMA: Task Based Holistic Evaluation of Large Language Model Applications-RAG Question Answering
Udita Patel
Rutu Mulkar
Jay Roberts
Cibi Chakravarthy Senthilkumar
Sujay Gandhi
Xiaofei Zheng
Naumaan Nayyar
Parul Kalra
Rafael Castrillo
141
0
0
16 May 2025
VeriFact: Enhancing Long-Form Factuality Evaluation with Refined Fact Extraction and Reference Facts
VeriFact: Enhancing Long-Form Factuality Evaluation with Refined Fact Extraction and Reference Facts
Xin Liu
Lechen Zhang
Sheza Munir
Yiyang Gu
Lu Wang
HILM
178
3
0
14 May 2025
Atomic Consistency Preference Optimization for Long-Form Question Answering
Atomic Consistency Preference Optimization for Long-Form Question Answering
Jingfeng Chen
Raghuveer Thirukovalluru
Junlin Wang
Kaiwei Luo
Bhuwan Dhingra
KELMHILM
222
2
0
14 May 2025
A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs
A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs
Artem Shelmanov
Ekaterina Fadeeva
Akim Tsvigun
Ivan Tsvigun
Zhuohan Xie
...
Caiqi Zhang
Artem Vazhentsev
Mrinmaya Sachan
Preslav Nakov
Timothy Baldwin
HILM
233
6
0
13 May 2025
Why Uncertainty Estimation Methods Fall Short in RAG: An Axiomatic Analysis
Why Uncertainty Estimation Methods Fall Short in RAG: An Axiomatic AnalysisAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Heydar Soudani
Evangelos Kanoulas
Faegheh Hasibi
321
5
0
12 May 2025
Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation
Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation
Galann Pennec
Zhengyuan Liu
Nicholas Asher
Philippe Muller
Nancy F. Chen
VGen
359
0
0
10 May 2025
Summarisation of German Judgments in conjunction with a Class-based Evaluation
Summarisation of German Judgments in conjunction with a Class-based Evaluation
Bianca Steffes
Nils Torben Wiedemann
Alexander Gratz
Pamela Hochreither
Jana Elina Meyer
Katharina Luise Schilke
AILawELM
191
0
0
09 May 2025
Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards
Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards
Manveer Singh Tamber
F. S. Bao
Chenyu Xu
Ge Luo
Suleman Kazi
Minseok Bae
Miaoran Li
Ofer Mendelevitch
Renyi Qu
Jimmy J. Lin
VLM
308
7
0
07 May 2025
Retrieval Augmented Generation Evaluation for Health Documents
Retrieval Augmented Generation Evaluation for Health Documents
Mario Ceresa
Lorenzo Bertolini
Valentin Comte
Nicholas Spadaro
Barbara Raffael
...
Sergio Consoli
Amalia Muñoz Piñeiro
Alex Patak
Maddalena Querci
Tobias Wiesenthal
RALM3DV
271
1
1
07 May 2025
UCSC at SemEval-2025 Task 3: Context, Models and Prompt Optimization for Automated Hallucination Detection in LLM Output
UCSC at SemEval-2025 Task 3: Context, Models and Prompt Optimization for Automated Hallucination Detection in LLM Output
Sicong Huang
Jincheng He
Shiyuan Huang
Karthik Raja Anandan
Arkajyoti Chakraborty
Ian Lane
HILMLRM
195
1
0
05 May 2025
Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering
Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Jihao Zhao
Chunlai Zhou
Daixuan Li
Shuaishuai Zu
Biao Qin
259
0
0
05 May 2025
A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models
A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models
Liqiang Jing
Guiming Hardy Chen
Ehsan Aghazadeh
Xin Eric Wang
Xinya Du
222
2
0
04 May 2025
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
Zongxia Li
Xiyang Wu
Guangyao Shi
Yubin Qin
Hongyang Du
Tianyi Zhou
Wanrong Zhu
Dinesh Manocha
Jordan Lee Boyd-Graber
MLLM
499
0
0
02 May 2025
A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage
A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage
Rui Xin
Niloofar Mireshghallah
Shuyue Stella Li
Michael Duan
Hyunwoo Kim
Yejin Choi
Yulia Tsvetkov
Sewoong Oh
Pang Wei Koh
331
17
0
28 Apr 2025
Towards Long Context Hallucination Detection
Towards Long Context Hallucination DetectionNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Siyi Liu
Kishaloy Halder
Zheng Qi
Wei Xiao
Nikolaos Pappas
Phu Mon Htut
Neha Anna John
Yassine Benajiba
Dan Roth
HILM
224
9
0
28 Apr 2025
Chatbot Arena Meets Nuggets: Towards Explanations and Diagnostics in the Evaluation of LLM Responses
Chatbot Arena Meets Nuggets: Towards Explanations and Diagnostics in the Evaluation of LLM Responses
Sahel Sharifymoghaddam
Shivani Upadhyay
Nandan Thakur
Ronak Pradeep
Jimmy Lin
RALM
383
1
0
28 Apr 2025
An Empirical Study of Evaluating Long-form Question Answering
An Empirical Study of Evaluating Long-form Question AnsweringAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
Ning Xian
Yixing Fan
Ruqing Zhang
Maarten de Rijke
Jiafeng Guo
ELM
136
1
0
25 Apr 2025
HalluLens: LLM Hallucination Benchmark
HalluLens: LLM Hallucination BenchmarkAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yejin Bang
Ziwei Ji
Alan Schelten
Anthony Hartshorn
Tara Fowler
Cheng Zhang
Nicola Cancedda
Pascale Fung
HILM
247
37
0
24 Apr 2025
Leveraging LLMs as Meta-Judges: A Multi-Agent Framework for Evaluating LLM Judgments
Leveraging LLMs as Meta-Judges: A Multi-Agent Framework for Evaluating LLM Judgments
Jian Wang
Jama Hussein Mohamud
Chongren Sun
Di Wu
Benoit Boulet
LLMAGELM
307
6
0
23 Apr 2025
Exploring the Role of Large Language Models in Cybersecurity: A Systematic Survey
Exploring the Role of Large Language Models in Cybersecurity: A Systematic Survey
Shuang Tian
Tao Zhang
Qingbin Liu
Jiacheng Wang
Xuangou Wu
...
Ruichen Zhang
Feiyu Xiong
Zhenhui Yuan
Shiwen Mao
Dong In Kim
292
4
0
22 Apr 2025
The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language ModelsAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
Ronak Pradeep
Nandan Thakur
Shivani Upadhyay
Daniel Fernando Campos
Nick Craswell
Jimmy Lin
210
10
0
21 Apr 2025
Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey
Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey
Aoran Gan
Hao Yu
Kai Zhang
Qi Liu
Wenyu Yan
Zhenya Huang
Shiwei Tong
Guoping Hu
RALM3DV
243
9
0
21 Apr 2025
Transparentize the Internal and External Knowledge Utilization in LLMs with Trustworthy Citation
Transparentize the Internal and External Knowledge Utilization in LLMs with Trustworthy CitationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Jiajun Shen
Tong Zhou
Yubo Chen
Delai Qiu
Shengping Liu
Kang Liu
Jun Zhao
HILMRALM
368
1
0
21 Apr 2025
CoLoTa: A Dataset for Entity-based Commonsense Reasoning over Long-Tail Knowledge
CoLoTa: A Dataset for Entity-based Commonsense Reasoning over Long-Tail KnowledgeAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
Armin Toroghi
Willis Guo
Scott Sanner
RALMLRM
180
1
0
20 Apr 2025
Efficient MAP Estimation of LLM Judgment Performance with Prior Transfer
Efficient MAP Estimation of LLM Judgment Performance with Prior Transfer
Huaizhi Qu
Inyoung Choi
Zhen Tan
Song Wang
Sukwon Yun
Qi Long
Faizan Siddiqui
Kwonjoon Lee
Tianlong Chen
226
4
0
17 Apr 2025
TALE: A Tool-Augmented Framework for Reference-Free Evaluation of Large Language Models
TALE: A Tool-Augmented Framework for Reference-Free Evaluation of Large Language Models
Sher Badshah
Ali Emami
Hassan Sajjad
LLMAGELM
277
1
0
10 Apr 2025
AgentAda: Skill-Adaptive Data Analytics for Tailored Insight Discovery
AgentAda: Skill-Adaptive Data Analytics for Tailored Insight Discovery
Amirhossein Abaskohi
A. Ramesh
Shailesh Nanisetty
Chirag Goel
David Vazquez
Christopher Pal
Spandana Gella
Giuseppe Carenini
I. Laradji
343
0
0
10 Apr 2025
How to Detect and Defeat Molecular Mirage: A Metric-Driven Benchmark for Hallucination in LLM-based Molecular Comprehension
How to Detect and Defeat Molecular Mirage: A Metric-Driven Benchmark for Hallucination in LLM-based Molecular Comprehension
Hao Li
Liuzhenghao Lv
He Cao
Zijing Liu
Zhiyuan Yan
Yu Wang
Yonghong Tian
Rui Wang
Li Yuan
266
4
0
10 Apr 2025
Plan-and-Refine: Diverse and Comprehensive Retrieval-Augmented Generation
Plan-and-Refine: Diverse and Comprehensive Retrieval-Augmented Generation
Alireza Salemi
Chris Samarinas
Hamed Zamani
156
0
0
10 Apr 2025
HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation
HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation
Mingxuan Li
Hanchen Li
Chenhao Tan
ALMELM
299
1
0
09 Apr 2025
Enabling Collaborative Parametric Knowledge Calibration for Retrieval-Augmented Vision Question Answering
Enabling Collaborative Parametric Knowledge Calibration for Retrieval-Augmented Vision Question Answering
Jiaqi Deng
Kaize Shi
Zonghan Wu
Huan Huo
Dingxian Wang
Guandong Xu
157
0
0
05 Apr 2025
Bonsai: Interpretable Tree-Adaptive Grounded Reasoning
Bonsai: Interpretable Tree-Adaptive Grounded Reasoning
Kate Sanders
Benjamin Van Durme
LRM
316
1
0
04 Apr 2025
BOOST: Bootstrapping Strategy-Driven Reasoning Programs for Program-Guided Fact-Checking
BOOST: Bootstrapping Strategy-Driven Reasoning Programs for Program-Guided Fact-Checking
Qisheng Hu
Quanyu Long
Wenya Wang
LRM
257
2
0
03 Apr 2025
LRAGE: Legal Retrieval Augmented Generation Evaluation Tool
LRAGE: Legal Retrieval Augmented Generation Evaluation Tool
Minhu Park
Hongseok Oh
Eunkyung Choi
Wonseok Hwang
AILawRALMELM
281
2
0
02 Apr 2025
WikiVideo: Article Generation from Multiple Videos
WikiVideo: Article Generation from Multiple Videos
Alexander Martin
Reno Kriz
William Walden
Kate Sanders
Hannah Recknor
Eugene Yang
Francis Ferraro
Benjamin Van Durme
DiffMVGen
350
3
0
01 Apr 2025
LLMs for Explainable AI: A Comprehensive Survey
LLMs for Explainable AI: A Comprehensive Survey
Ahsan Bilal
David Ebert
Beiyu Lin
465
31
0
31 Mar 2025
A Scalable Framework for Evaluating Health Language Models
A Scalable Framework for Evaluating Health Language Models
Neil Mallinar
A. Heydari
Xin Liu
Anthony Z. Faranesh
Brent Winslow
...
Mark Malhotra
Shwetak N. Patel
Javier L. Prieto
Daniel J. McDuff
Ahmed A. Metwally
LM&MA
260
7
0
30 Mar 2025
FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research
FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research
Gabriel Recchia
Chatrik Singh Mangat
Issac Li
Gayatri Krishnakumar
ALM
270
0
0
29 Mar 2025
Fact-checking AI-generated news reports: Can LLMs catch their own lies?
Fact-checking AI-generated news reports: Can LLMs catch their own lies?
Jiayi Yao
Haibo Sun
Nianwen Xue
HILM
167
0
0
24 Mar 2025
VeriSafe Agent: Safeguarding Mobile GUI Agent via Logic-based Action Verification
VeriSafe Agent: Safeguarding Mobile GUI Agent via Logic-based Action Verification
Jungjae Lee
Dongjae Lee
Chihun Choi
Youngmin Im
Jaeyoung Wi
Kihong Heo
Sangeun Oh
Sunjae Lee
Insik Shin
LLMAG
270
5
0
24 Mar 2025
SciClaims: An End-to-End Generative System for Biomedical Claim Analysis
SciClaims: An End-to-End Generative System for Biomedical Claim Analysis
Raúl Ortega
José Manuel Gómez-Pérez
292
2
0
24 Mar 2025
ProDehaze: Prompting Diffusion Models Toward Faithful Image Dehazing
ProDehaze: Prompting Diffusion Models Toward Faithful Image Dehazing
Tianwen Zhou
Jing Wang
Songtao Wu
Kuanhong Xu
DiffM
248
0
0
21 Mar 2025
FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs
FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs
Albert Sawczyn
Jakub Binkowski
Denis Janiak
Bogdan Gabrys
Tomasz Kajdanowicz
HILMLRM
349
3
0
21 Mar 2025
Can one size fit all?: Measuring Failure in Multi-Document Summarization Domain Transfer
Can one size fit all?: Measuring Failure in Multi-Document Summarization Domain Transfer
Alexandra DeLucia
Mark Dredze
293
0
0
20 Mar 2025
Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey
Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey
Xiaoou Liu
Tiejin Chen
Longchao Da
Chacha Chen
Zhen Lin
Hua Wei
HILM
424
35
0
20 Mar 2025
Extract, Match, and Score: An Evaluation Paradigm for Long Question-context-answer Triplets in Financial Analysis
Extract, Match, and Score: An Evaluation Paradigm for Long Question-context-answer Triplets in Financial Analysis
Bo Hu
Han Yuan
Vlad Pandelea
Wuqiong Luo
Yingzhu Zhao
Zheng Ma
193
1
0
20 Mar 2025
Previous
12345...111213
Next