Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.14251
Cited By
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
23 May 2023
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"
50 / 454 papers shown
Title
QAPyramid: Fine-grained Evaluation of Content Selection for Text Summarization
Shiyue Zhang
David Wan
Arie Cattan
Ayal Klein
Ido Dagan
Mohit Bansal
81
0
0
10 Dec 2024
Enhancing Trust in Large Language Models with Uncertainty-Aware Fine-Tuning
R. Krishnan
Piyush Khanna
Omesh Tickoo
HILM
69
1
0
03 Dec 2024
FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models
Alice Heiman
Xiaoman Zhang
E. Chen
Sung Eun Kim
Pranav Rajpurkar
HILM
MedIm
77
0
0
27 Nov 2024
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
108
63
0
25 Nov 2024
Investigating Factuality in Long-Form Text Generation: The Roles of Self-Known and Self-Unknown
Lifu Tu
Rui Meng
Shafiq R. Joty
Yingbo Zhou
Semih Yavuz
HILM
67
0
0
24 Nov 2024
LLM Hallucination Reasoning with Zero-shot Knowledge Test
Seongmin Lee
Hsiang Hsu
Chun-Fu Chen
LRM
39
2
0
14 Nov 2024
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Khaoula Chehbouni
Jonathan Colaço-Carr
Yash More
Jackie CK Cheung
G. Farnadi
73
0
0
12 Nov 2024
FactLens: Benchmarking Fine-Grained Fact Verification
Kushan Mitra
Dan Zhang
Sajjadur Rahman
Estevam R. Hruschka
HILM
38
1
0
08 Nov 2024
Measuring short-form factuality in large language models
Jason W. Wei
Nguyen Karina
Hyung Won Chung
Yunxin Joy Jiao
Spencer Papay
Amelia Glaese
John Schulman
W. Fedus
ELM
KELM
HILM
35
38
0
07 Nov 2024
Culinary Class Wars: Evaluating LLMs using ASH in Cuisine Transfer Task
Hoonick Lee
Mogan Gim
Donghyeon Park
Donghee Choi
Jaewoo Kang
26
0
0
04 Nov 2024
Human-inspired Perspectives: A Survey on AI Long-term Memory
Zihong He
Weizhe Lin
Hao Zheng
Fan Zhang
Matt Jones
Laurence Aitchison
X. Xu
Miao Liu
Per Ola Kristensson
Junxiao Shen
77
2
0
01 Nov 2024
The Automated Verification of Textual Claims (AVeriTeC) Shared Task
M. Schlichtkrull
Yulong Chen
Chenxi Whitehouse
Zhenyun Deng
Mubashara Akhtar
...
Christos Christodoulopoulos
O. Cocarascu
Arpit Mittal
James Thorne
Andreas Vlachos
39
6
0
31 Oct 2024
Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models
J. Wu
Tsz Ting Chung
Kai Chen
Dit-Yan Yeung
VLM
LRM
53
3
0
30 Oct 2024
Improving Uncertainty Quantification in Large Language Models via Semantic Embeddings
Yashvir S. Grewal
Edwin V. Bonilla
Thang D. Bui
UQCV
25
3
0
30 Oct 2024
FactBench: A Dynamic Benchmark for In-the-Wild Language Model Factuality Evaluation
Farima Fatahi Bayat
Lechen Zhang
Sheza Munir
Lu Wang
HILM
37
3
0
29 Oct 2024
LongReward: Improving Long-context Large Language Models with AI Feedback
J. Zhang
Zhongni Hou
Xin Lv
S. Cao
Zhenyu Hou
Yilin Niu
Lei Hou
Yuxiao Dong
Ling Feng
Juanzi Li
OffRL
LRM
33
7
0
28 Oct 2024
Graph-based Uncertainty Metrics for Long-form Language Model Outputs
Mingjian Jiang
Yangjun Ruan
Prasanna Sattigeri
Salim Roukos
Tatsunori Hashimoto
18
0
0
28 Oct 2024
Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning
Yujian Liu
Shiyu Chang
Tommi Jaakkola
Yang Zhang
23
0
0
25 Oct 2024
ChunkRAG: Novel LLM-Chunk Filtering Method for RAG Systems
Ishneet Sukhvinder Singh
Ritvik Aggarwal
Ibrahim Allahverdiyev
Muhammad Taha
Aslihan Akalin
Kevin Zhu
Sean O'Brien
23
8
0
25 Oct 2024
Improving Model Factuality with Fine-grained Critique-based Evaluator
Yiqing Xie
Wenxuan Zhou
Pradyot Prakash
Di Jin
Yuning Mao
...
Sinong Wang
Han Fang
Carolyn Rose
Daniel Fried
Hejia Zhang
HILM
33
5
0
24 Oct 2024
Multilingual Hallucination Gaps in Large Language Models
Cléa Chataigner
Afaf Taik
G. Farnadi
HILM
LRM
32
3
0
23 Oct 2024
Leveraging the Domain Adaptation of Retrieval Augmented Generation Models for Question Answering and Reducing Hallucination
Salman Rakin
Md. A. R. Shibly
Zahin M. Hossain
Zeeshan Khan
Md. Mostofa Akbar
23
1
0
23 Oct 2024
Enhancing Answer Attribution for Faithful Text Generation with Large Language Models
Juraj Vladika
Luca Mülln
Florian Matthes
23
0
0
22 Oct 2024
Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning
Zongmeng Zhang
Yufeng Shi
Jinhua Zhu
Wengang Zhou
Xiang Qi
Peng Zhang
H. Li
RALM
HILM
18
0
0
22 Oct 2024
Self-Explained Keywords Empower Large Language Models for Code Generation
Lishui Fan
Mouxiang Chen
Zhongxin Liu
38
1
0
21 Oct 2024
RAC: Efficient LLM Factuality Correction with Retrieval Augmentation
Changmao Li
Jeffrey Flanigan
KELM
LRM
24
0
0
21 Oct 2024
BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression
Yuankai Li
Jia-Chen Gu
Di Wu
Kai-Wei Chang
Nanyun Peng
RALM
MQ
18
0
0
20 Oct 2024
Cross-Document Event-Keyed Summarization
William Walden
Pavlo Kuchmiichuk
Alexander Martin
Chihsheng Jin
Angela Cao
Claire Sun
Curisia Allen
Aaron Steven White
RALM
28
0
0
18 Oct 2024
Tell me what I need to know: Exploring LLM-based (Personalized) Abstractive Multi-Source Meeting Summarization
Frederic Kirstein
Terry Ruas
Robert Kratel
Bela Gipp
21
2
0
18 Oct 2024
LoGU: Long-form Generation with Uncertainty Expressions
Ruihan Yang
Caiqi Zhang
Zhisong Zhang
Xinting Huang
Sen Yang
Nigel Collier
Dong Yu
Deqing Yang
HILM
24
3
0
18 Oct 2024
Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs
Sumanth Doddapaneni
Mohammed Safi Ur Rahman Khan
Dilip Venkatesh
Raj Dabre
Anoop Kunchukuttan
Mitesh M. Khapra
ELM
35
1
0
17 Oct 2024
FIRE: Fact-checking with Iterative Retrieval and Verification
Zhuohan Xie
Rui Xing
Yuxia Wang
Jiahui Geng
Hasan Iqbal
Dhruv Sahnan
Iryna Gurevych
Preslav Nakov
HILM
50
2
0
17 Oct 2024
Decomposition Dilemmas: Does Claim Decomposition Boost or Burden Fact-Checking Performance?
Qisheng Hu
Quanyu Long
Wenya Wang
66
4
0
17 Oct 2024
Probing-RAG: Self-Probing to Guide Language Models in Selective Document Retrieval
Ingeol Baek
Hwan Chang
Byeongjeong Kim
Jimin Lee
Hwanhee Lee
RALM
57
4
0
17 Oct 2024
From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization
Catarina G. Belem
Pouya Pezeskhpour
Hayate Iso
Seiji Maekawa
Nikita Bhutani
Estevam R. Hruschka
HILM
65
1
0
17 Oct 2024
A Claim Decomposition Benchmark for Long-form Answer Verification
Zhihao Zhang
Yixing Fan
Ruqing Zhang
J. Guo
HILM
28
0
0
16 Oct 2024
An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation
Junjie Chen
Weihang Su
Zhumin Chu
Haitao Li
Qinyao Ai
Yiqun Liu
Min Zhang
Shaoping Ma
24
3
0
16 Oct 2024
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Jihan Yao
Wenxuan Ding
Shangbin Feng
Lucy Lu Wang
Yulia Tsvetkov
25
0
0
14 Oct 2024
Medico: Towards Hallucination Detection and Correction with Multi-source Evidence Fusion
Xinping Zhao
Jindi Yu
Zhenyu Liu
Jifang Wang
Dongfang Li
Yibin Chen
Baotian Hu
Min Zhang
HILM
18
0
0
14 Oct 2024
BookWorm: A Dataset for Character Description and Analysis
Argyrios Papoudakis
Mirella Lapata
Frank Keller
18
1
0
14 Oct 2024
ReIFE: Re-evaluating Instruction-Following Evaluation
Yixin Liu
Kejian Shi
Alexander R. Fabbri
Yilun Zhao
Peifeng Wang
Chien-Sheng Wu
Shafiq Joty
Arman Cohan
22
6
0
09 Oct 2024
Uncovering Factor Level Preferences to Improve Human-Model Alignment
Juhyun Oh
Eunsu Kim
Jiseon Kim
Wenda Xu
Inha Cha
William Yang Wang
Alice H. Oh
21
0
0
09 Oct 2024
LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple Constraints
Thomas Palmeira Ferraz
Kartik Mehta
Yu-Hsiang Lin
Haw-Shiuan Chang
Shereen Oraby
Sijia Liu
Vivek Subramanian
Tagyoung Chung
Mohit Bansal
Nanyun Peng
48
7
0
09 Oct 2024
ReFIR: Grounding Large Restoration Models with Retrieval Augmentation
Hang Guo
Tao Dai
Zhihao Ouyang
Taolin Zhang
Yaohua Zha
Bin Chen
Shu-Tao Xia
DiffM
30
5
0
08 Oct 2024
Why am I seeing this: Democratizing End User Auditing for Online Content Recommendations
Chaoran Chen
Leyang Li
Luke Cao
Yanfang Ye
Tianshi Li
Yaxing Yao
Toby Jia-jun Li
MLAU
37
2
0
07 Oct 2024
Realizing Video Summarization from the Path of Language-based Semantic Understanding
Kuan-Chen Mu
Zhi-Yi Chin
Wei-Chen Chiu
13
0
0
06 Oct 2024
Alignment Between the Decision-Making Logic of LLMs and Human Cognition: A Case Study on Legal LLMs
Lu Chen
Yuxuan Huang
Yixing Li
Yaohui Jin
Shuai Zhao
Zilong Zheng
Quanshi Zhang
21
1
0
06 Oct 2024
Locating Information Gaps and Narrative Inconsistencies Across Languages: A Case Study of LGBT People Portrayals on Wikipedia
Farhan Samir
Chan Young Park
Anjalie Field
Vered Shwartz
Yulia Tsvetkov
28
1
0
05 Oct 2024
CS4: Measuring the Creativity of Large Language Models Automatically by Controlling the Number of Story-Writing Constraints
Anirudh Atmakuru
Jatin Nainani
Rohith Siddhartha Reddy Bheemreddy
Anirudh Lakkaraju
Zonghai Yao
Hamed Zamani
Haw-Shiuan Chang
63
2
0
05 Oct 2024
ECon: On the Detection and Resolution of Evidence Conflicts
Cheng Jiayang
Chunkit Chan
Qianqian Zhuang
Lin Qiu
Tianhang Zhang
Tengxiao Liu
Yangqiu Song
Yue Zhang
Pengfei Liu
Zheng Zhang
36
1
0
05 Oct 2024
Previous
1
2
3
4
5
6
...
8
9
10
Next