Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.14251
Cited By
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
23 May 2023
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"
50 / 454 papers shown
Title
Benchmarking Large Language Models on Multiple Tasks in Bioinformatics NLP with Prompting
Jiyue Jiang
Pengan Chen
J. T. Wang
Dongchen He
Ziqin Wei
...
Yimin Fan
Xiangyu Shi
J. Sun
Chuan Wu
Y. Li
LM&MA
43
0
0
06 Mar 2025
DSVD: Dynamic Self-Verify Decoding for Faithful Generation in Large Language Models
Y. Guo
Yuchen Yang
Zhe Chen
Pingjie Wang
Yusheng Liao
Y. Zhang
Yanfeng Wang
Yu Wang
HILM
61
0
0
05 Mar 2025
AILS-NTUA at SemEval-2025 Task 3: Leveraging Large Language Models and Translation Strategies for Multilingual Hallucination Detection
Dimitra Karkani
Maria Lymperaiou
Giorgos Filandrianos
Nikolaos Spanos
Athanasios Voulodimos
Giorgos Stamou
HILM
LRM
77
0
0
04 Mar 2025
Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs
Yuzhe Gu
W. Zhang
Chengqi Lyu
D. Lin
Kai Chen
58
1
0
04 Mar 2025
LLM as a Broken Telephone: Iterative Generation Distorts Information
Amr Mohamed
Mingmeng Geng
Michalis Vazirgiannis
Guokan Shang
59
1
0
27 Feb 2025
Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement
Siyuan Zhang
Y. Zhang
Yinpeng Dong
Hang Su
HILM
KELM
117
0
0
26 Feb 2025
Conformal Linguistic Calibration: Trading-off between Factuality and Specificity
Zhengping Jiang
Anqi Liu
Benjamin Van Durme
84
1
0
26 Feb 2025
Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in QA Agents
A. Lewis
Michael White
Jing Liu
T. Koike-Akino
K. Parsons
Y. Wang
HILM
51
0
0
26 Feb 2025
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Hao Peng
Y. Qi
Xiaozhi Wang
Zijun Yao
Bin Xu
Lei Hou
Juanzi Li
ALM
LRM
52
4
0
26 Feb 2025
Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles
Kuang Wang
X. Li
S. M. I. Simon X. Yang
Li Zhou
Feng Jiang
H. Li
42
0
0
26 Feb 2025
FactReasoner: A Probabilistic Approach to Long-Form Factuality Assessment for Large Language Models
Radu Marinescu
D. Bhattacharjya
Junkyu Lee
T. Tchrakian
Javier Carnerero-Cano
Yufang Hou
Elizabeth M. Daly
Alessandra Pascale
HILM
LRM
56
0
0
25 Feb 2025
Faster, Cheaper, Better: Multi-Objective Hyperparameter Optimization for LLM and RAG Systems
Matthew Barker
Andrew Bell
Evan Thomas
James Carr
Thomas Andrews
Umang Bhatt
80
1
0
25 Feb 2025
Beyond Translation: LLM-Based Data Generation for Multilingual Fact-Checking
Yi-Ling Chung
Aurora Cobo
Pablo Serna
SyDa
HILM
58
0
0
24 Feb 2025
Is Relevance Propagated from Retriever to Generator in RAG?
Fangzheng Tian
Debasis Ganguly
Craig Macdonald
RALM
45
1
0
24 Feb 2025
PosterSum: A Multimodal Benchmark for Scientific Poster Summarization
Rohit Saxena
Pasquale Minervini
Frank Keller
VLM
64
0
0
24 Feb 2025
Is Free Self-Alignment Possible?
Dyah Adila
Changho Shin
Yijing Zhang
Frederic Sala
MoMe
108
2
0
24 Feb 2025
Grounded Persuasive Language Generation for Automated Marketing
Jibang Wu
Chenghao Yang
Simon Mahns
Chaoqi Wang
Hao Zhu
Fei Fang
Haifeng Xu
38
1
0
24 Feb 2025
GraphCheck: Breaking Long-Term Text Barriers with Extracted Knowledge Graph-Powered Fact-Checking
Yingjian Chen
Haoran Liu
Yinhong Liu
Rui Yang
Han Yuan
Yanran Fu
Pengyuan Zhou
Qingyu Chen
James Caverlee
Irene Z Li
HILM
46
0
0
23 Feb 2025
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
SeongYeub Chu
JongWoo Kim
MunYong Yi
55
1
0
21 Feb 2025
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking
Zekun Xi
Wenbiao Yin
Jizhan Fang
Jialong Wu
Runnan Fang
N. Zhang
Jiang Yong
Pengjun Xie
Fei Huang
H. Chen
SyDa
LRM
100
6
0
21 Feb 2025
How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild
Saad Obaid ul Islam
Anne Lauscher
Goran Glavas
HILM
LRM
115
1
0
21 Feb 2025
Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models
Aliyah R. Hsu
James Zhu
Zhichao Wang
Bin Bi
Shubham Mehrotra
...
Sougata Chaudhuri
Regunathan Radhakrishnan
S. Asur
Claire Na Cheng
Bin Yu
ALM
LRM
67
0
0
20 Feb 2025
Can Knowledge Graphs Make Large Language Models More Trustworthy? An Empirical Study Over Open-ended Question Answering
Yuan Sui
Yufei He
Zifeng Ding
Bryan Hooi
HILM
ELM
RALM
64
7
0
20 Feb 2025
Hallucination Detection in Large Language Models with Metamorphic Relations
Borui Yang
Md Afif Al Mamun
Jie M. Zhang
Gias Uddin
HILM
59
0
0
20 Feb 2025
Rare Disease Differential Diagnosis with Large Language Models at Scale: From Abdominal Actinomycosis to Wilson's Disease
Elliot Schumacher
Dhruv Naik
Anitha Kannan
LM&MA
36
0
0
20 Feb 2025
Can Your Uncertainty Scores Detect Hallucinated Entity?
Min-Hsuan Yeh
Max Kamachee
Seongheon Park
Yixuan Li
HILM
44
1
0
17 Feb 2025
Navigating the Helpfulness-Truthfulness Trade-Off with Uncertainty-Aware Instruction Fine-Tuning
Tianyi Wu
Jingwei Ni
Bryan Hooi
Jiaheng Zhang
Elliott Ash
See-Kiong Ng
Mrinmaya Sachan
Markus Leippold
51
0
0
17 Feb 2025
STRIVE: Structured Reasoning for Self-Improvement in Claim Verification
Haisong Gong
Jing Li
Junfei Wu
Qiang Liu
Shu Wu
Liang Wang
LRM
38
0
0
17 Feb 2025
Optimizing Knowledge Integration in Retrieval-Augmented Generation with Self-Selection
Yan Weng
Fengbin Zhu
Tong Ye
Haoyan Liu
Fuli Feng
Tat-Seng Chua
RALM
93
1
0
10 Feb 2025
OverThink: Slowdown Attacks on Reasoning LLMs
A. Kumar
Jaechul Roh
A. Naseh
Marzena Karpinska
Mohit Iyyer
Amir Houmansadr
Eugene Bagdasarian
LRM
57
12
0
04 Feb 2025
Context-Aware Hierarchical Merging for Long Document Summarization
Litu Ou
Mirella Lapata
MoMe
126
1
0
03 Feb 2025
Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation
Takyoung Kim
Kyungjae Lee
Y. Jang
Ji Yong Cho
Gangwoo Kim
Minseok Cho
Moontae Lee
104
0
0
28 Jan 2025
OnionEval: An Unified Evaluation of Fact-conflicting Hallucination for Small-Large Language Models
Chongren Sun
Y. Li
Di Wu
Benoit Boulet
HILM
LRM
75
1
0
22 Jan 2025
Iterative Tree Analysis for Medical Critics
Zenan Huang
Mingwei Li
Zheng Zhou
Youxin Jiang
65
0
0
18 Jan 2025
Enhancing Retrieval-Augmented Generation: A Study of Best Practices
Siran Li
Linus Stenzel
Carsten Eickhoff
Seyed Ali Bahrainian
RALM
3DV
55
4
0
13 Jan 2025
Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model Training
Shahrad Mohammadzadeh
Juan David Guerra
Marco Bonizzato
Reihaneh Rabbany
Golnoosh Farnadi
HILM
49
0
0
08 Jan 2025
Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use
Mohit Chandra
Siddharth Sriraman
Gaurav Verma
Harneet Singh Khanuja
Jose Suarez Campayo
Zihang Li
Michael L. Birnbaum
M. D. Choudhury
AI4MH
29
5
0
08 Jan 2025
The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input
Alon Jacovi
Andrew Wang
Chris Alberti
Connie Tao
Jon Lipovetz
...
Rachana Fellinger
Rui Wang
Zizhao Zhang
Sasha Goldshtein
Dipanjan Das
HILM
ALM
82
12
0
06 Jan 2025
A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls
Sheikh Shafayat
Dongkeun Yoon
Woori Jang
Jiwoo Choi
Alice H. Oh
Seohyon Jung
91
1
0
03 Jan 2025
LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts
Helia Hashemi
J. Eisner
Corby Rosset
Benjamin Van Durme
Chris Kedzie
68
1
0
03 Jan 2025
A review of faithfulness metrics for hallucination assessment in Large Language Models
Ben Malin
Tatiana Kalganova
Nikoloas Boulgouris
HILM
59
2
0
03 Jan 2025
Evaluate Summarization in Fine-Granularity: Auto Evaluation with LLM
Dong Yuan
Eti Rastogi
Fen Zhao
Sagar Goyal
Gautam Naik
Sree Prasanna Rajagopal
36
0
0
31 Dec 2024
ComparisonQA: Evaluating Factuality Robustness of LLMs Through Knowledge Frequency Control and Uncertainty
Qing Zong
Z. Wang
Tianshi Zheng
Xiyu Ren
Y. Song
57
1
0
31 Dec 2024
Fine-grained and Explainable Factuality Evaluation for Multimodal Summarization
Liqiang Jing
Jingxuan Zuo
Yue Zhang
30
7
0
31 Dec 2024
LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation
Eunsu Kim
Juyoung Suk
Seungone Kim
Niklas Muennighoff
Dongkwan Kim
Alice H. Oh
ELM
78
1
0
31 Dec 2024
MapExplorer: New Content Generation from Low-Dimensional Visualizations
Xingjian Zhang
Ziyang Xiong
Shixuan Liu
Yutong Xie
Tolga Ergen
Dongsub Shim
Hua Xu
Honglak Lee
Qiaozhu Me
37
0
0
24 Dec 2024
A Survey of Calibration Process for Black-Box LLMs
Liangru Xie
Hui Liu
Jingying Zeng
Xianfeng Tang
Yan Han
Chen Luo
Jing Huang
Zhen Li
Suhang Wang
Qi He
74
1
0
17 Dec 2024
Attention with Dependency Parsing Augmentation for Fine-Grained Attribution
Qiang Ding
Lvzhou Luo
Yixuan Cao
Ping Luo
74
0
0
16 Dec 2024
Coverage-based Fairness in Multi-document Summarization
Haoyuan Li
Yusen Zhang
Rui Zhang
Snigdha Chaturvedi
70
0
0
11 Dec 2024
HalluCana: Fixing LLM Hallucination with A Canary Lookahead
Tianyi Li
Erenay Dayanik
Shubhi Tyagi
Andrea Pierleoni
HILM
70
0
0
10 Dec 2024
Previous
1
2
3
4
5
...
8
9
10
Next