ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.14251
  4. Cited By
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long
  Form Text Generation
v1v2 (latest)

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
23 May 2023
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Anuj Kumar
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
    HILMALM
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"

50 / 608 papers shown
Title
Is Factuality Decoding a Free Lunch for LLMs? Evaluation on Knowledge
  Editing Benchmark
Is Factuality Decoding a Free Lunch for LLMs? Evaluation on Knowledge Editing Benchmark
Baolong Bi
Shenghua Liu
Yiwei Wang
Lingrui Mei
Xueqi Cheng
KELM
82
3
0
30 Mar 2024
LUQ: Long-text Uncertainty Quantification for LLMs
LUQ: Long-text Uncertainty Quantification for LLMs
Caiqi Zhang
Fangyu Liu
Marco Basaldella
Nigel Collier
HILM
305
60
0
29 Mar 2024
FACTOID: FACtual enTailment fOr hallucInation Detection
FACTOID: FACtual enTailment fOr hallucInation Detection
Vipula Rawte
S. M. Towhidul
Krishnav Rajbangshi
Shravani Nag
Vasu Sharma
Amit P. Sheth
Amitava Das
HILM
235
9
0
28 Mar 2024
Rejection Improves Reliability: Training LLMs to Refuse Unknown
  Questions Using RL from Knowledge Feedback
Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback
Hongshen Xu
Zichen Zhu
Situo Zhang
Da Ma
Shuai Fan
Lu Chen
Kai Yu
HILM
277
55
0
27 Mar 2024
CheckEval: A reliable LLM-as-a-Judge framework for evaluating text generation using checklists
CheckEval: A reliable LLM-as-a-Judge framework for evaluating text generation using checklists
Yukyung Lee
Joonghoon Kim
Jaehee Kim
Hyowon Cho
Jaewook Kang
Pilsung Kang
Najoung Kim
ELM
240
5
0
27 Mar 2024
Attribute First, then Generate: Locally-attributable Grounded Text
  Generation
Attribute First, then Generate: Locally-attributable Grounded Text Generation
Aviv Slobodkin
Eran Hirsch
Arie Cattan
Tal Schuster
Ido Dagan
334
43
0
25 Mar 2024
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Neeloy Chakraborty
Melkior Ornik
Katherine Driggs-Campbell
LRM
396
27
0
25 Mar 2024
Reinforcement Learning from Reflective Feedback (RLRF): Aligning and
  Improving LLMs via Fine-Grained Self-Reflection
Reinforcement Learning from Reflective Feedback (RLRF): Aligning and Improving LLMs via Fine-Grained Self-Reflection
Kyungjae Lee
Dasol Hwang
Sunghyun Park
Youngsoo Jang
Moontae Lee
186
14
0
21 Mar 2024
A Closer Look at Claim Decomposition
A Closer Look at Claim Decomposition
Miriam Wanner
Seth Ebner
Zhengping Jiang
Mark Dredze
Benjamin Van Durme
224
35
0
18 Mar 2024
TriSum: Learning Summarization Ability from Large Language Models with
  Structured Rationale
TriSum: Learning Summarization Ability from Large Language Models with Structured RationaleNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Pengcheng Jiang
Cao Xiao
Zifeng Wang
Parminder Bhatia
Jimeng Sun
Jiawei Han
LRM
195
16
0
15 Mar 2024
Think Twice Before Trusting: Self-Detection for Large Language Models
  through Comprehensive Answer Reflection
Think Twice Before Trusting: Self-Detection for Large Language Models through Comprehensive Answer ReflectionConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Moxin Li
Wenjie Wang
Fuli Feng
Fengbin Zhu
Qifan Wang
Tat-Seng Chua
HILMLRM
316
31
0
15 Mar 2024
ClaimVer: Explainable Claim-Level Verification and Evidence Attribution
  of Text Through Knowledge Graphs
ClaimVer: Explainable Claim-Level Verification and Evidence Attribution of Text Through Knowledge GraphsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Preetam Prabhu Srikar Dammu
Himanshu Naidu
Mouly Dewan
YoungMin Kim
Tanya Roosta
Aman Chadha
Chirag Shah
378
13
0
12 Mar 2024
Truth-Aware Context Selection: Mitigating Hallucinations of Large
  Language Models Being Misled by Untruthful Contexts
Truth-Aware Context Selection: Mitigating Hallucinations of Large Language Models Being Misled by Untruthful ContextsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Tian Yu
Shaolei Zhang
Yang Feng
HILM
162
13
0
12 Mar 2024
Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Unfamiliar Finetuning Examples Control How Language Models HallucinateNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Katie Kang
Eric Wallace
Claire Tomlin
Aviral Kumar
Sergey Levine
HILMLRM
260
83
0
08 Mar 2024
ERBench: An Entity-Relationship based Automatically Verifiable
  Hallucination Benchmark for Large Language Models
ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language ModelsNeural Information Processing Systems (NeurIPS), 2024
Jio Oh
Soyeon Kim
Junseok Seo
Yongfeng Zhang
Ruochen Xu
Xing Xie
Steven Euijong Whang
164
11
0
08 Mar 2024
Fact-Checking the Output of Large Language Models via Token-Level
  Uncertainty Quantification
Fact-Checking the Output of Large Language Models via Token-Level Uncertainty QuantificationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Ekaterina Fadeeva
Aleksandr Rubashevskii
Artem Shelmanov
Sergey Petrakov
Jinyan Su
...
Gleb Kuzmin
Sergey Petrakov
Timothy Baldwin
Preslav Nakov
Maxim Panov
HILM
296
94
0
07 Mar 2024
FaaF: Facts as a Function for the evaluation of generated text
FaaF: Facts as a Function for the evaluation of generated text
Vasileios Katranidis
Gabor Barany
HILMRALM
174
5
0
06 Mar 2024
A Modular Approach for Multimodal Summarization of TV Shows
A Modular Approach for Multimodal Summarization of TV Shows
Louis Mahon
Mirella Lapata
413
11
0
06 Mar 2024
Multimodal Large Language Models to Support Real-World Fact-Checking
Multimodal Large Language Models to Support Real-World Fact-Checking
Fauzan Farooqui
Yova Kementchedjhieva
Preslav Nakov
Iryna Gurevych
LRM
284
23
0
06 Mar 2024
Benchmarking Hallucination in Large Language Models based on
  Unanswerable Math Word Problem
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem
Yuhong Sun
Zhangyue Yin
Qipeng Guo
Jiawen Wu
Xipeng Qiu
Hui Zhao
131
36
0
06 Mar 2024
Reliable, Adaptable, and Attributable Language Models with Retrieval
Reliable, Adaptable, and Attributable Language Models with Retrieval
Akari Asai
Zexuan Zhong
Danqi Chen
Pang Wei Koh
Luke Zettlemoyer
Hanna Hajishirzi
Anuj Kumar
KELMRALM
281
79
0
05 Mar 2024
FENICE: Factuality Evaluation of summarization based on Natural language
  Inference and Claim Extraction
FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction
Alessandro Sciré
Karim Ghonim
Roberto Navigli
HILM
202
20
0
04 Mar 2024
WebCiteS: Attributed Query-Focused Summarization on Chinese Web Search
  Results with Citations
WebCiteS: Attributed Query-Focused Summarization on Chinese Web Search Results with Citations
Haolin Deng
Chang Wang
Xin Li
Dezhang Yuan
Junlang Zhan
Tianhua Zhou
Jin Ma
Jun Gao
Ruifeng Xu
HILM
197
4
0
04 Mar 2024
SyllabusQA: A Course Logistics Question Answering Dataset
SyllabusQA: A Course Logistics Question Answering Dataset
Nigel Fernandez
Alexander Scarlatos
Andrew Lan
180
11
0
03 Mar 2024
Right for Right Reasons: Large Language Models for Verifiable Commonsense Knowledge Graph Question Answering
Right for Right Reasons: Large Language Models for Verifiable Commonsense Knowledge Graph Question Answering
Armin Toroghi
Willis Guo
Mohammad Mahdi Torabi pour
Scott Sanner
LRM
257
14
0
03 Mar 2024
A Survey of AI-generated Text Forensic Systems: Detection, Attribution,
  and Characterization
A Survey of AI-generated Text Forensic Systems: Detection, Attribution, and Characterization
Tharindu Kumarage
Garima Agrawal
Paras Sheth
Raha Moraffah
Amanat Chadha
Joshua Garland
Huan Liu
DeLMO
221
21
0
02 Mar 2024
Reading Subtext: Evaluating Large Language Models on Short Story
  Summarization with Writers
Reading Subtext: Evaluating Large Language Models on Short Story Summarization with Writers
Melanie Subbiah
Sean Zhang
Lydia B. Chilton
Kathleen McKeown
332
22
0
02 Mar 2024
Attribute Structuring Improves LLM-Based Evaluation of Clinical Text
  Summaries
Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries
Zelalem Gero
Chandan Singh
Yiqing Xie
Sheng Zhang
Tristan Naumann
Jianfeng Gao
Hoifung Poon
ELMALM
173
7
0
01 Mar 2024
Do Zombies Understand? A Choose-Your-Own-Adventure Exploration of
  Machine Cognition
Do Zombies Understand? A Choose-Your-Own-Adventure Exploration of Machine Cognition
Ariel Goldstein
Gabriel Stanovsky
174
2
0
01 Mar 2024
Whispers that Shake Foundations: Analyzing and Mitigating False Premise
  Hallucinations in Large Language Models
Whispers that Shake Foundations: Analyzing and Mitigating False Premise Hallucinations in Large Language Models
Hongbang Yuan
Pengfei Cao
Zhuoran Jin
Yubo Chen
Daojian Zeng
Kang Liu
Jun Zhao
HILM
184
9
0
29 Feb 2024
Multi-FAct: Assessing Multilingual LLMs' Multi-Regional Knowledge using
  FActScore
Multi-FAct: Assessing Multilingual LLMs' Multi-Regional Knowledge using FActScore
Sheikh Shafayat
Eunsu Kim
Juhyun Oh
Alice Oh
HILM
226
6
0
28 Feb 2024
Collaborative decoding of critical tokens for boosting factuality of
  large language models
Collaborative decoding of critical tokens for boosting factuality of large language models
Lifeng Jin
Baolin Peng
Linfeng Song
Haitao Mi
Ye Tian
Dong Yu
HILM
145
8
0
28 Feb 2024
Evaluating Very Long-Term Conversational Memory of LLM Agents
Evaluating Very Long-Term Conversational Memory of LLM Agents
A. Maharana
Dong-Ho Lee
Sergey Tulyakov
Mohit Bansal
Francesco Barbieri
Yuwei Fang
LLMAG
290
170
0
27 Feb 2024
Case-Based or Rule-Based: How Do Transformers Do the Math?
Case-Based or Rule-Based: How Do Transformers Do the Math?
Yi Hu
Xiaojuan Tang
Haotong Yang
Muhan Zhang
LRM
338
27
0
27 Feb 2024
Fine-Grained Natural Language Inference Based Faithfulness Evaluation
  for Diverse Summarisation Tasks
Fine-Grained Natural Language Inference Based Faithfulness Evaluation for Diverse Summarisation Tasks
Huajian Zhang
Yumo Xu
Laura Perez-Beltrachini
HILM
169
23
0
27 Feb 2024
Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses
Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses
Juyeon Kim
Jeongeun Lee
Yoonho Chang
Chanyeol Choi
Junseong Kim
Jy-yong Sohn
KELMLRM
400
5
0
27 Feb 2024
HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination
  Tendency of LLMs
HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs
Cem Uluoglakci
T. Taşkaya-Temizel
HILM
145
4
0
25 Feb 2024
Evaluating Robustness of Generative Search Engine on Adversarial Factual
  Questions
Evaluating Robustness of Generative Search Engine on Adversarial Factual Questions
Xuming Hu
Xiaochuan Li
Junzhe Chen
Hai-Tao Zheng
Yangning Li
...
Yasheng Wang
Qun Liu
Lijie Wen
Philip S. Yu
Zhijiang Guo
AAMLELM
162
8
0
25 Feb 2024
HD-Eval: Aligning Large Language Model Evaluators Through Hierarchical
  Criteria Decomposition
HD-Eval: Aligning Large Language Model Evaluators Through Hierarchical Criteria Decomposition
Yuxuan Liu
Tianchi Yang
Shaohan Huang
Zihan Zhang
Haizhen Huang
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
164
23
0
24 Feb 2024
Fine-Grained Self-Endorsement Improves Factuality and Reasoning
Fine-Grained Self-Endorsement Improves Factuality and Reasoning
Ante Wang
Linfeng Song
Baolin Peng
Ye Tian
Lifeng Jin
Haitao Mi
Jinsong Su
Dong Yu
HILMLRM
128
9
0
23 Feb 2024
Fast Adversarial Attacks on Language Models In One GPU Minute
Fast Adversarial Attacks on Language Models In One GPU Minute
Vinu Sankar Sadasivan
Shoumik Saha
Gaurang Sriramanan
Priyatham Kattakinda
Atoosa Malemir Chegini
Soheil Feizi
MIALM
274
63
0
23 Feb 2024
Faithful Temporal Question Answering over Heterogeneous Sources
Faithful Temporal Question Answering over Heterogeneous Sources
Zhen Jia
Philipp Christmann
Gerhard Weikum
200
15
0
23 Feb 2024
UFO: a Unified and Flexible Framework for Evaluating Factuality of Large
  Language Models
UFO: a Unified and Flexible Framework for Evaluating Factuality of Large Language Models
Zhaoheng Huang
Zhicheng Dou
Yutao Zhu
Ji-Rong Wen
HILM
113
2
0
22 Feb 2024
Assisting in Writing Wikipedia-like Articles From Scratch with Large
  Language Models
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
Yijia Shao
Yucheng Jiang
Theodore A. Kanell
Peter Xu
Omar Khattab
Monica S. Lam
LLMAGKELM
224
101
0
22 Feb 2024
RefuteBench: Evaluating Refuting Instruction-Following for Large
  Language Models
RefuteBench: Evaluating Refuting Instruction-Following for Large Language Models
Jianhao Yan
Yun Luo
Yue Zhang
ALMLRM
244
12
0
21 Feb 2024
Factual consistency evaluation of summarization in the Era of large language models
Factual consistency evaluation of summarization in the Era of large language models
Zheheng Luo
Qianqian Xie
Sophia Ananiadou
HILM
170
0
0
21 Feb 2024
Identifying Factual Inconsistencies in Summaries: Grounding Model
  Inference via Task Taxonomy
Identifying Factual Inconsistencies in Summaries: Grounding Model Inference via Task Taxonomy
Liyan Xu
Zhenlin Su
Mo Yu
Jin Xu
Jinho D. Choi
Jie Zhou
Fei Liu
HILM
258
5
0
20 Feb 2024
GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence
GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence
Kundan Krishna
S. Ramprasad
Prakhar Gupta
Byron C. Wallace
Zachary Chase Lipton
Jeffrey P. Bigham
HILMKELMSyDa
316
15
0
19 Feb 2024
Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When
  and What to Retrieve for LLMs
Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs
Jiejun Tan
Zhicheng Dou
Yutao Zhu
Peidong Guo
Kun Fang
Ji-Rong Wen
298
53
0
19 Feb 2024
Ask Optimal Questions: Aligning Large Language Models with Retriever's Preference in Conversation
Ask Optimal Questions: Aligning Large Language Models with Retriever's Preference in Conversation
Chanwoong Yoon
Gangwoo Kim
Byeongguk Jeon
Sungdong Kim
Yohan Jo
Jaewoo Kang
KELMRALM
269
0
0
19 Feb 2024
Previous
123...101112139
Next