ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.14251
  4. Cited By
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long
  Form Text Generation
v1v2 (latest)

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
23 May 2023
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Anuj Kumar
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
    HILMALM
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"

50 / 615 papers shown
WildHallucinations: Evaluating Long-form Factuality in LLMs with
  Real-World Entity Queries
WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Wenting Zhao
Tanya Goyal
Yu Ying Chiu
Liwei Jiang
Benjamin Newman
...
Khyathi Chandu
Ronan Le Bras
Claire Cardie
Yuntian Deng
Yejin Choi
HILM
193
20
0
24 Jul 2024
Enhancing LLM's Cognition via Structurization
Enhancing LLM's Cognition via Structurization
Kai-Chun Liu
Zhihang Fu
Chao Chen
Wei Zhang
Rongxin Jiang
Fan Zhou
Yao-Shen Chen
Yue-bo Wu
Jieping Ye
257
1
0
23 Jul 2024
Halu-J: Critique-Based Hallucination Judge
Halu-J: Critique-Based Hallucination Judge
Binjie Wang
Steffi Chern
Ethan Chern
Pengfei Liu
HILM
272
15
0
17 Jul 2024
Crafting the Path: Robust Query Rewriting for Information Retrieval
Crafting the Path: Robust Query Rewriting for Information Retrieval
Ingeol Baek
Jimin Lee
Joonho Yang
Hwanhee Lee
212
9
0
17 Jul 2024
Localizing and Mitigating Errors in Long-form Question Answering
Localizing and Mitigating Errors in Long-form Question Answering
Rachneet Sachdeva
Yixiao Song
Mohit Iyyer
Iryna Gurevych
HILM
303
0
0
16 Jul 2024
GraphEval: A Knowledge-Graph Based LLM Hallucination Evaluation
  Framework
GraphEval: A Knowledge-Graph Based LLM Hallucination Evaluation Framework
Hannah Sansford
Nicholas Richardson
Hermina Petric Maretic
Juba Nait Saada
223
32
0
15 Jul 2024
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Zilong Wang
Zifeng Wang
Long Le
Huaixiu Steven Zheng
Swaroop Mishra
...
Anush Mattapalli
Ankur Taly
Jingbo Shang
Zifeng Wang
Tomas Pfister
RALM
318
73
0
11 Jul 2024
Lookback Lens: Detecting and Mitigating Contextual Hallucinations in
  Large Language Models Using Only Attention Maps
Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps
Yung-Sung Chuang
Linlu Qiu
Cheng-Yu Hsieh
Ranjay Krishna
Yoon Kim
James R. Glass
HILM
240
84
0
09 Jul 2024
STORYSUMM: Evaluating Faithfulness in Story Summarization
STORYSUMM: Evaluating Faithfulness in Story Summarization
Melanie Subbiah
Faisal Ladhak
Akankshya Mishra
Griffin Adams
Lydia B. Chilton
Kathleen McKeown
430
8
0
09 Jul 2024
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in
  the Era of Large Language Models
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models
Jinliang Lu
Ziliang Pang
Min Xiao
Yaochen Zhu
Rui Xia
Jiajun Zhang
MoMe
384
48
0
08 Jul 2024
KG-FPQ: Evaluating Factuality Hallucination in LLMs with Knowledge
  Graph-based False Premise Questions
KG-FPQ: Evaluating Factuality Hallucination in LLMs with Knowledge Graph-based False Premise Questions
Yanxu Zhu
Jinlin Xiao
Yuhang Wang
Jitao Sang
HILM
186
7
0
08 Jul 2024
From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty
From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty
Maor Ivgi
Ori Yoran
Jonathan Berant
Mor Geva
HILM
424
9
0
08 Jul 2024
EVA-Score: Evaluation of Long-form Summarization on Informativeness
  through Extraction and Validation
EVA-Score: Evaluation of Long-form Summarization on Informativeness through Extraction and Validation
Wendi Li
Xin Zhong
Chengsi Wang
Gaoche Wu
Bowen Zhou
152
2
0
06 Jul 2024
ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language
  Models
ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models
Yuzhe Gu
Ziwei Ji
Wenwei Zhang
Chengqi Lyu
Dahua Lin
Kai Chen
HILM
205
12
0
05 Jul 2024
LLM Internal States Reveal Hallucination Risk Faced With a Query
LLM Internal States Reveal Hallucination Risk Faced With a Query
Ziwei Ji
Delong Chen
Etsuko Ishii
Samuel Cahyawijaya
Yejin Bang
Bryan Wilie
Pascale Fung
HILMLRM
292
64
0
03 Jul 2024
Towards a Holistic Framework for Multimodal Large Language Models in
  Three-dimensional Brain CT Report Generation
Towards a Holistic Framework for Multimodal Large Language Models in Three-dimensional Brain CT Report Generation
Cheng-Yi Li
Kao-Jung Chang
Cheng-Fu Yang
Hsin-Yu Wu
Wenting Chen
...
Yu-Chun Chen
Shih-Pin Chen
J. Lirng
Kai-Wei Chang
Shih-Hwa Chiou
LM&MAMedIm
189
37
0
02 Jul 2024
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models
Bodhisattwa Prasad Majumder
Harshit Surana
Dhruv Agarwal
Bhavana Dalvi Mishra
Abhijeetsingh Meena
Aryan Prakhar
Tirth Vora
Tushar Khot
Ashish Sabharwal
Peter Clark
ELM
216
35
0
01 Jul 2024
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Philippe Laban
Alexander R. Fabbri
Caiming Xiong
Chien-Sheng Wu
RALM
349
86
0
01 Jul 2024
Face4RAG: Factual Consistency Evaluation for Retrieval Augmented
  Generation in Chinese
Face4RAG: Factual Consistency Evaluation for Retrieval Augmented Generation in Chinese
Yunqi Xu
Tianchi Cai
Jiyan Jiang
Xierui Song
327
10
0
01 Jul 2024
FineSurE: Fine-grained Summarization Evaluation using LLMs
FineSurE: Fine-grained Summarization Evaluation using LLMs
Hwanjun Song
Hang Su
Igor Shalyminov
Jason (Jinglun) Cai
Saab Mansour
HILM
402
77
0
01 Jul 2024
PFME: A Modular Approach for Fine-grained Hallucination Detection and
  Editing of Large Language Models
PFME: A Modular Approach for Fine-grained Hallucination Detection and Editing of Large Language Models
Kunquan Deng
Zeyu Huang
Chen Li
Chenghua Lin
Min Gao
Wenge Rong
KELM
185
2
0
29 Jun 2024
From RAG to RICHES: Retrieval Interlaced with Sequence Generation
From RAG to RICHES: Retrieval Interlaced with Sequence Generation
Palak Jain
Livio Baldini Soares
Tom Kwiatkowski
VLM
192
6
0
29 Jun 2024
Molecular Facts: Desiderata for Decontextualization in LLM Fact
  Verification
Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification
Anisha Gunjal
Greg Durrett
HILM
271
35
0
28 Jun 2024
Scalable and Domain-General Abstractive Proposition Segmentation
Scalable and Domain-General Abstractive Proposition Segmentation
Mohammad Javad Hosseini
Yang Gao
Tim Baumgärtner
Alex Fabrikant
Reinald Kim Amplayo
178
0
0
28 Jun 2024
VERISCORE: Evaluating the factuality of verifiable claims in long-form
  text generation
VERISCORE: Evaluating the factuality of verifiable claims in long-form text generation
Yixiao Song
Yekyung Kim
Mohit Iyyer
HILM
241
72
0
27 Jun 2024
Mitigating Hallucination in Fictional Character Role-Play
Mitigating Hallucination in Fictional Character Role-Play
Nafis Sadeq
Zhouhang Xie
Byungkyu Kang
Prarit Lamba
Xiang Gao
Julian McAuley
HILM
294
15
0
25 Jun 2024
CaLMQA: Exploring culturally specific long-form question answering across 23 languages
CaLMQA: Exploring culturally specific long-form question answering across 23 languages
Shane Arora
Marzena Karpinska
Hung-Ting Chen
Ipsita Bhattacharjee
Mohit Iyyer
Eunsol Choi
HILM
449
22
0
25 Jun 2024
CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented
  Analysis Generation
CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation
Abe Bohan Hou
Orion Weller
Guanghui Qin
Eugene Yang
Dawn J Lawrie
Nils Holzenberger
Andrew Blair-Stanek
Benjamin Van Durme
AILawELM
371
19
0
24 Jun 2024
One Thousand and One Pairs: A "novel" challenge for long-context
  language models
One Thousand and One Pairs: A "novel" challenge for long-context language models
Marzena Karpinska
Katherine Thai
Kyle Lo
Tanya Goyal
Mohit Iyyer
LRM
388
75
0
24 Jun 2024
Found in the Middle: Calibrating Positional Attention Bias Improves Long
  Context Utilization
Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization
Cheng-Yu Hsieh
Yung-Sung Chuang
Chun-Liang Li
Zifeng Wang
Long T. Le
...
James R. Glass
Alexander Ratner
Zifeng Wang
Ranjay Krishna
Tomas Pfister
347
72
0
23 Jun 2024
Semantic Entropy Probes: Robust and Cheap Hallucination Detection in
  LLMs
Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs
Jannik Kossen
Jiatong Han
Muhammed Razzak
Lisa Schut
Shreshth A. Malik
Yarin Gal
HILM
315
116
0
22 Jun 2024
MU-Bench: A Multitask Multimodal Benchmark for Machine Unlearning
MU-Bench: A Multitask Multimodal Benchmark for Machine Unlearning
Jiali Cheng
Hadi Amiri
BDL
315
10
0
21 Jun 2024
Factual Dialogue Summarization via Learning from Large Language Models
Factual Dialogue Summarization via Learning from Large Language Models
Rongxin Zhu
Jey Han Lau
Jianzhong Qi
HILM
267
6
0
20 Jun 2024
An Analysis of Multilingual FActScore
An Analysis of Multilingual FActScore
Kim Trong Vu
Michael Krumdick
Varshini Reddy
Franck Dernoncourt
Viet Dac Lai
HILM
345
3
0
20 Jun 2024
PostMark: A Robust Blackbox Watermark for Large Language Models
PostMark: A Robust Blackbox Watermark for Large Language Models
Yapei Chang
Kalpesh Krishna
Amir Houmansadr
John Wieting
Mohit Iyyer
185
20
0
20 Jun 2024
Selected Languages are All You Need for Cross-lingual Truthfulness Transfer
Selected Languages are All You Need for Cross-lingual Truthfulness Transfer
Weihao Liu
Ning Wu
Wenbiao Ding
Shining Liang
Ming Gong
Dongmei Zhang
HILM
360
0
0
20 Jun 2024
WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge
  Conflicts from Wikipedia
WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from Wikipedia
Yufang Hou
Alessandra Pascale
Javier Carnerero-Cano
T. Tchrakian
Radu Marinescu
Elizabeth M. Daly
Inkit Padhi
P. Sattigeri
176
25
0
19 Jun 2024
Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented
  Generation
Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Di Wu
Jia-Chen Gu
Fan Yin
Nanyun Peng
Kai-Wei Chang
HILM
142
5
0
19 Jun 2024
Finding Blind Spots in Evaluator LLMs with Interpretable Checklists
Finding Blind Spots in Evaluator LLMs with Interpretable ChecklistsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Sumanth Doddapaneni
Mohammed Safi Ur Rahman Khan
Sshubam Verma
Mitesh Khapra
226
25
0
19 Jun 2024
Do Multimodal Foundation Models Understand Enterprise Workflows? A
  Benchmark for Business Process Management Tasks
Do Multimodal Foundation Models Understand Enterprise Workflows? A Benchmark for Business Process Management TasksNeural Information Processing Systems (NeurIPS), 2024
Michael Wornow
A. Narayan
Ben T Viggiano
Ishan S. Khare
Tathagat Verma
...
Joshua Martinez
Vardhan Agrawal
Althea Hudson
N. Shah
Christopher Ré
228
4
0
19 Jun 2024
Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and
  Metrics for Open Domain Question Answering in the Era of Large Language
  Models
Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language ModelsIEEE Access (IEEE Access), 2024
Akchay Srivastava
Atif Memon
ELM
207
2
0
19 Jun 2024
Learning to Generate Answers with Citations via Factual Consistency
  Models
Learning to Generate Answers with Citations via Factual Consistency ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Rami Aly
Zhiqiang Tang
Samson Tan
George Karypis
HILM
255
10
0
19 Jun 2024
Estimating Knowledge in Large Language Models Without Generating a
  Single Token
Estimating Knowledge in Large Language Models Without Generating a Single TokenConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Daniela Gottesman
Mor Geva
263
28
0
18 Jun 2024
Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for
  Large Language Models
Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for Large Language Models
Hongbang Yuan
Yubo Chen
Pengfei Cao
Zhuoran Jin
Kang Liu
Jun Zhao
175
0
0
18 Jun 2024
Satyrn: A Platform for Analytics Augmented Generation
Satyrn: A Platform for Analytics Augmented Generation
Marko Sterbentz
Cameron Barrie
Shubham Shahi
Abhratanu Dutta
Donna Hooshmand
Harper Pack
Kristian J. Hammond
172
1
0
17 Jun 2024
Small Agent Can Also Rock! Empowering Small Language Models as
  Hallucination Detector
Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector
Xiaoxue Cheng
Junyi Li
Wayne Xin Zhao
Hongzhi Zhang
Fuzheng Zhang
Di Zhang
Kun Gai
Ji-Rong Wen
HILMLLMAG
216
20
0
17 Jun 2024
Self-training Large Language Models through Knowledge Detection
Self-training Large Language Models through Knowledge Detection
Wei Jie Yeo
Teddy Ferdinan
Przemyslaw Kazienko
Frank Xing
Erik Cambria
232
15
0
17 Jun 2024
Aligning Large Language Models from Self-Reference AI Feedback with one
  General Principle
Aligning Large Language Models from Self-Reference AI Feedback with one General Principle
Rong Bao
Rui Zheng
Jiajun Sun
Xiao Wang
Enyu Zhou
Bo Wang
Qi Zhang
Liang Ding
Dacheng Tao
ALM
332
1
0
17 Jun 2024
Large language model validity via enhanced conformal prediction methods
Large language model validity via enhanced conformal prediction methodsNeural Information Processing Systems (NeurIPS), 2024
John J. Cherian
Isaac Gibbs
Emmanuel J. Candès
237
62
0
14 Jun 2024
DR-RAG: Applying Dynamic Document Relevance to Retrieval-Augmented
  Generation for Question-Answering
DR-RAG: Applying Dynamic Document Relevance to Retrieval-Augmented Generation for Question-Answering
Zijian Hei
Weiling Liu
Wenjie Ou
Juyi Qiao
Junming Jiao
Guowen Song
Ting Tian
Yi Lin
RALM
362
18
0
11 Jun 2024
Previous
123...789...111213
Next