ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.14251
  4. Cited By
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long
  Form Text Generation
v1v2 (latest)

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
23 May 2023
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Anuj Kumar
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
    HILMALM
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"

50 / 608 papers shown
Title
Textual Bayes: Quantifying Uncertainty in LLM-Based Systems
Textual Bayes: Quantifying Uncertainty in LLM-Based Systems
Brendan Leigh Ross
Noël Vouitsis
Atiyeh Ashari Ghomi
Rasa Hosseinzadeh
Ji Xin
...
Yi Sui
Shiyi Hou
Kin Kwan Leung
Gabriel Loaiza-Ganem
Jesse C. Cresswell
244
3
0
11 Jun 2025
LLM-as-a-qualitative-judge: automating error analysis in natural language generation
LLM-as-a-qualitative-judge: automating error analysis in natural language generation
Nadezhda Chirkova
Tunde Oluwaseyi Ajayi
Seth Aycock
Zain Muhammad Mujahid
Vladana Perlić
Ekaterina Borisova
Markarit Vartampetian
ELM
218
0
0
10 Jun 2025
MiniCPM4: Ultra-Efficient LLMs on End Devices
MiniCPM4: Ultra-Efficient LLMs on End Devices
MiniCPM Team
Chaojun Xiao
Yuxuan Li
Xu Han
Yuzhuo Bai
...
Zhiyuan Liu
Guoyang Zeng
Chao Jia
Dahai Li
Maosong Sun
MLLM
255
19
0
09 Jun 2025
ConfRAG: Confidence-Guided Retrieval-Augmenting Generation
ConfRAG: Confidence-Guided Retrieval-Augmenting Generation
Yin Huang
Yifan Ethan Xu
Kai Sun
Vera Yan
Alicia Sun
...
Aaron Colak
Anuj Kumar
Anuj Kumar
Wen-tau Yih
Xin Luna Dong
HILM
226
2
0
08 Jun 2025
Beyond Facts: Evaluating Intent Hallucination in Large Language Models
Beyond Facts: Evaluating Intent Hallucination in Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yijie Hao
Haofei Yu
Jiaxuan You
HILMLRM
122
4
0
06 Jun 2025
Generating Grounded Responses to Counter Misinformation via Learning Efficient Fine-Grained Critiques
Generating Grounded Responses to Counter Misinformation via Learning Efficient Fine-Grained CritiquesInternational Joint Conference on Artificial Intelligence (IJCAI), 2025
Xiaofei Xu
Xiuzhen Zhang
Ke Deng
HILM
189
0
0
06 Jun 2025
Micro-Act: Mitigating Knowledge Conflict in LLM-based RAG via Actionable Self-Reasoning
Micro-Act: Mitigating Knowledge Conflict in LLM-based RAG via Actionable Self-Reasoning
Nan Huo
Jinyang Li
Bowen Qin
Ge Qu
Xiaolong Li
Xiaodong Li
Chenhao Ma
Reynold Cheng
RALM
256
1
0
05 Jun 2025
CLATTER: Comprehensive Entailment Reasoning for Hallucination Detection
Ron Eliav
Arie Cattan
Eran Hirsch
Shahaf Bassan
Elias Stengel-Eskin
Mohit Bansal
Ido Dagan
LRM
252
3
0
05 Jun 2025
SUCEA: Reasoning-Intensive Retrieval for Adversarial Fact-checking through Claim Decomposition and Editing
Hongjun Liu
Yilun Zhao
Arman Cohan
Chen Zhao
AAMLLRM
248
0
0
05 Jun 2025
SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat
SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat
Yuru Jiang
Wenxuan Ding
Shangbin Feng
Greg Durrett
Yulia Tsvetkov
241
2
0
05 Jun 2025
High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning
High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning
Tim Franzmeyer
Archie Sravankumar
Lijuan Liu
Yuning Mao
Rui Hou
Sinong Wang
Jakob Foerster
Luke Zettlemoyer
Madian Khabsa
KELMALM
197
0
0
04 Jun 2025
TracLLM: A Generic Framework for Attributing Long Context LLMs
TracLLM: A Generic Framework for Attributing Long Context LLMs
Yanting Wang
Wei Zou
Runpeng Geng
Jinyuan Jia
LLMAG
403
3
0
04 Jun 2025
Shaking to Reveal: Perturbation-Based Detection of LLM Hallucinations
Shaking to Reveal: Perturbation-Based Detection of LLM Hallucinations
Jinyuan Luo
Zhen Fang
Shouqing Yang
Seongheon Park
Ling Chen
AAMLHILM
189
0
0
03 Jun 2025
LAQuer: Localized Attribution Queries in Content-grounded Generation
LAQuer: Localized Attribution Queries in Content-grounded GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Eran Hirsch
Aviv Slobodkin
David Wan
Elias Stengel-Eskin
Mohit Bansal
Ido Dagan
189
5
0
01 Jun 2025
Reconsidering LLM Uncertainty Estimation Methods in the Wild
Reconsidering LLM Uncertainty Estimation Methods in the WildAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yavuz Faruk Bakman
D. Yaldiz
Sungmin Kang
Tuo Zhang
Baturalp Buyukates
Salman Avestimehr
Sai Praneeth Karimireddy
173
4
0
01 Jun 2025
Inter-Passage Verification for Multi-evidence Multi-answer QA
Inter-Passage Verification for Multi-evidence Multi-answer QAAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Bingsen Chen
Shengjie Wang
Xi Ye
Chen Zhao
RALM
140
0
0
31 May 2025
Vid2Coach: Transforming How-To Videos into Task Assistants
Vid2Coach: Transforming How-To Videos into Task AssistantsACM Symposium on User Interface Software and Technology (UIST), 2025
Mina Huh
Zihui Xue
Ujjaini Das
Kumar Ashutosh
Kristen Grauman
Amy Pavel
193
4
0
31 May 2025
HD-NDEs: Neural Differential Equations for Hallucination Detection in LLMs
HD-NDEs: Neural Differential Equations for Hallucination Detection in LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Qing Li
Fauzan Farooqui
Zongxiong Chen
Derui Zhu
Yuxia Wang
Congbo Ma
Chenyang Lyu
Fakhri Karray
198
2
0
30 May 2025
Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs
Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs
Juraj Vladika
Annika Domres
Mai Nguyen
Rebecca Moser
Jana Nano
...
Denise Bernhardt
Stephanie E. Combs
Kai J. Borm
Florian Matthes
J. Peeken
HILM
159
2
0
30 May 2025
LaMP-QA: A Benchmark for Personalized Long-form Question Answering
LaMP-QA: A Benchmark for Personalized Long-form Question Answering
Alireza Salemi
Hamed Zamani
256
5
0
30 May 2025
WikiGap: Promoting Epistemic Equity by Surfacing Knowledge Gaps Between English Wikipedia and other Language Editions
WikiGap: Promoting Epistemic Equity by Surfacing Knowledge Gaps Between English Wikipedia and other Language Editions
Zining Wang
Yuxuan Zhang
Dongwook Yoon
Nicholas Vincent
Farhan Samir
Vered Shwartz
KELM
237
1
0
30 May 2025
Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation
Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation
Caiqi Zhang
Xiaochen Zhu
Chengzu Li
Nigel Collier
Andreas Vlachos
OffRLHILM
227
7
0
29 May 2025
From Parameters to Prompts: Understanding and Mitigating the Factuality Gap between Fine-Tuned LLMs
From Parameters to Prompts: Understanding and Mitigating the Factuality Gap between Fine-Tuned LLMs
Xuan Gong
Hanbo Huang
Shiyu Liang
181
0
0
29 May 2025
ARC: Argument Representation and Coverage Analysis for Zero-Shot Long Document Summarization with Instruction Following LLMs
ARC: Argument Representation and Coverage Analysis for Zero-Shot Long Document Summarization with Instruction Following LLMs
Mohamed S. Elaraby
Diane Litman
LLMAG
151
0
0
29 May 2025
How Does Response Length Affect Long-Form Factuality
How Does Response Length Affect Long-Form FactualityAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
James Xu Zhao
Jimmy Z.J. Liu
Bryan Hooi
See-Kiong Ng
HILMKELM
188
3
0
29 May 2025
Found in Translation: Measuring Multilingual LLM Consistency as Simple as Translate then Evaluate
Found in Translation: Measuring Multilingual LLM Consistency as Simple as Translate then Evaluate
Ashim Gupta
Maitrey Mehta
Zhichao Xu
Vivek Srikumar
180
1
0
28 May 2025
Retrieval-Augmented Generation: A Comprehensive Survey of Architectures, Enhancements, and Robustness Frontiers
Retrieval-Augmented Generation: A Comprehensive Survey of Architectures, Enhancements, and Robustness Frontiers
Chaitanya Sharma
RALM3DV
234
5
0
28 May 2025
LegalSearchLM: Rethinking Legal Case Retrieval as Legal Elements Generation
LegalSearchLM: Rethinking Legal Case Retrieval as Legal Elements Generation
Chaeeun Kim
Jinu Lee
Wonseok Hwang
AILawRALMELM
251
0
0
28 May 2025
Uncertainty-Aware Attention Heads: Efficient Unsupervised Uncertainty Quantification for LLMs
Uncertainty-Aware Attention Heads: Efficient Unsupervised Uncertainty Quantification for LLMs
Artem Vazhentsev
Abdelrahman Boda Sadallah
Gleb Kuzmin
Ekaterina Fadeeva
Ivan Lazichny
...
Maxim Panov
Timothy Baldwin
Mrinmaya Sachan
Preslav Nakov
Artem Shelmanov
EDLHILM
335
3
0
26 May 2025
Reasoning Is Not All You Need: Examining LLMs for Multi-Turn Mental Health Conversations
Reasoning Is Not All You Need: Examining LLMs for Multi-Turn Mental Health Conversations
Mohit Chandra
Siddharth Sriraman
Harneet Singh Khanuja
Yiqiao Jin
Munmun De Choudhury
LM&MAAI4MHLRM
210
0
0
26 May 2025
MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks
MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks
Suhana Bedi
Hejie Cui
Miguel Fuentes
Alyssa Unell
Michael Wornow
...
M. Lungren
Eric Horvitz
Abigail Z. Jacobs
M. Pfeffer
N. Shah
ELMLM&MAAI4MH
177
33
0
26 May 2025
ExAnte: A Benchmark for Ex-Ante Inference in Large Language Models
ExAnte: A Benchmark for Ex-Ante Inference in Large Language Models
Yachuan Liu
Xiaochun Wei
Lin Shi
Xinnuo Li
Bohan Zhang
Paramveer S. Dhillon
Qiaozhu Mei
181
0
0
26 May 2025
Does quantization affect models' performance on long-context tasks?
Does quantization affect models' performance on long-context tasks?
Anmol Mekala
Anirudh Atmakuru
Yixiao Song
Marzena Karpinska
Mohit Iyyer
MQ
395
1
0
26 May 2025
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research
Joao Coelho
Jingjie Ning
Jingyuan He
Kangrui Mao
Abhijay Paladugu
...
Jiahe Jin
Jamie Callan
João Magalhães
Bruno Martins
Chenyan Xiong
262
15
0
25 May 2025
MedScore: Generalizable Factuality Evaluation of Free-Form Medical Answers by Domain-adapted Claim Decomposition and Verification
MedScore: Generalizable Factuality Evaluation of Free-Form Medical Answers by Domain-adapted Claim Decomposition and Verification
Heyuan Huang
Alexandra DeLucia
Vijay Murari Tiyyala
Mark Dredze
HILMMedIm
266
1
0
24 May 2025
Writing Like the Best: Exemplar-Based Expository Text Generation
Writing Like the Best: Exemplar-Based Expository Text GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yuxiang Liu
Kevin Chen-Chuan Chang
202
1
0
24 May 2025
Empirical Investigation of Latent Representational Dynamics in Large Language Models: A Manifold Evolution Perspective
Empirical Investigation of Latent Representational Dynamics in Large Language Models: A Manifold Evolution Perspective
Yukun Zhang
Qi Dong
AI4CE
132
0
0
24 May 2025
HASH-RAG: Bridging Deep Hashing with Retriever for Efficient, Fine Retrieval and Augmented Generation
HASH-RAG: Bridging Deep Hashing with Retriever for Efficient, Fine Retrieval and Augmented GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Jinyu Guo
Xunlei Chen
Qiyang Xia
Zhaokun Wang
Jie Ou
Libo Qin
Shunyu Yao
Wenhong Tian
441
2
0
22 May 2025
UNCLE: Benchmarking Uncertainty Expressions in Long-Form Generation
UNCLE: Benchmarking Uncertainty Expressions in Long-Form Generation
Ruihan Yang
Caiqi Zhang
Zhisong Zhang
Xinting Huang
Dong Yu
Nigel Collier
Deqing Yang
ELM
228
4
0
22 May 2025
X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMs
X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMs
Rui Ye
Xiangrui Liu
Qimin Wu
Xianghe Pang
Zhenfei Yin
Lei Bai
Siheng Chen
LLMAG
189
11
0
22 May 2025
EMULATE: A Multi-Agent Framework for Determining the Veracity of Atomic Claims by Emulating Human Actions
EMULATE: A Multi-Agent Framework for Determining the Veracity of Atomic Claims by Emulating Human Actions
Spencer Hong
Meng Luo
Xinyi Wan
225
1
0
22 May 2025
MASLab: A Unified and Comprehensive Codebase for LLM-based Multi-Agent Systems
MASLab: A Unified and Comprehensive Codebase for LLM-based Multi-Agent Systems
Rui Ye
Keduan Huang
Qimin Wu
Yuzhu Cai
Tian Jin
...
Bo An
Yang Gao
Wenjun Wu
Lei Bai
Siheng Chen
LLMAG
356
7
0
22 May 2025
Advancing the Scientific Method with Large Language Models: From Hypothesis to Discovery
Advancing the Scientific Method with Large Language Models: From Hypothesis to Discovery
Yanbo Zhang
S. Khan
Adnan Mahmud
Huck Yang
Alexander Lavin
...
James A. Evans
Alan R. Bundy
Jannis Brugger
Jesper Tegner
Hector Zenil
LM&MA
345
5
0
22 May 2025
CUB: Benchmarking Context Utilisation Techniques for Language Models
CUB: Benchmarking Context Utilisation Techniques for Language Models
Lovisa Hagström
Youna Kim
Haeun Yu
Sang-goo Lee
Richard Johansson
Hyunsoo Cho
Isabelle Augenstein
202
2
0
22 May 2025
Long-Form Information Alignment Evaluation Beyond Atomic Facts
Long-Form Information Alignment Evaluation Beyond Atomic Facts
Danna Zheng
Mirella Lapata
Jeff Z. Pan
HILM
186
1
0
21 May 2025
UrduFactCheck: An Agentic Fact-Checking Framework for Urdu with Evidence Boosting and Benchmarking
UrduFactCheck: An Agentic Fact-Checking Framework for Urdu with Evidence Boosting and Benchmarking
Sarfraz Ahmad
Hasan Iqbal
Momina Ahsan
Numaan Naeem
Muhammad Ahsan Riaz Khan
Arham Riaz
Muhammad Arslan Manzoor
Yuxia Wang
Preslav Nakov
HILMELM
409
0
0
21 May 2025
Listen to the Context: Towards Faithful Large Language Models for Retrieval Augmented Generation on Climate Questions
Listen to the Context: Towards Faithful Large Language Models for Retrieval Augmented Generation on Climate Questions
David Thulke
Jakob Kemmler
Christian Dugast
Hermann Ney
RALMHILM
137
0
0
21 May 2025
Pre-training Limited Memory Language Models with Internal and External Knowledge
Pre-training Limited Memory Language Models with Internal and External Knowledge
Linxi Zhao
Sofian Zalouk
Christian K. Belardi
Justin Lovelace
Jin Peng Zhou
Ryan Thomas Noonan
Dongyoung Go
Kilian Q. Weinberger
Yoav Artzi
Jennifer J. Sun
KELMHILM
328
0
0
21 May 2025
Hallucinate at the Last in Long Response Generation: A Case Study on Long Document Summarization
Hallucinate at the Last in Long Response Generation: A Case Study on Long Document Summarization
Joonho Yang
Seunghyun Yoon
Hwan Chang
Byeongjeong Kim
Hwanhee Lee
HILM
400
2
0
21 May 2025
Pierce the Mists, Greet the Sky: Decipher Knowledge Overshadowing via Knowledge Circuit Analysis
Pierce the Mists, Greet the Sky: Decipher Knowledge Overshadowing via Knowledge Circuit Analysis
Haoming Huang
Yibo Yan
Jiahao Huo
Xin Zou
Xinfeng Li
Kun Wang
Xuming Hu
433
1
0
20 May 2025
Previous
123456...111213
Next