Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.14251
Cited By
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
23 May 2023
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"
50 / 454 papers shown
Title
EVA-Score: Evaluation of Long-form Summarization on Informativeness through Extraction and Validation
Yuchen Fan
Xin Zhong
Chengsi Wang
Gaoche Wu
Bowen Zhou
34
2
0
06 Jul 2024
ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models
Yuzhe Gu
Ziwei Ji
Wenwei Zhang
Chengqi Lyu
Dahua Lin
Kai Chen
HILM
34
5
0
05 Jul 2024
LLM Internal States Reveal Hallucination Risk Faced With a Query
Ziwei Ji
Delong Chen
Etsuko Ishii
Samuel Cahyawijaya
Yejin Bang
Bryan Wilie
Pascale Fung
HILM
LRM
34
19
0
03 Jul 2024
Towards a Holistic Framework for Multimodal Large Language Models in Three-dimensional Brain CT Report Generation
Cheng-Yi Li
Kao-Jung Chang
Cheng-Fu Yang
Hsin-Yu Wu
Wenting Chen
...
Yu-Chun Chen
Shih-Pin Chen
J. Lirng
Kai-Wei Chang
Shih-Hwa Chiou
LM&MA
MedIm
21
3
0
02 Jul 2024
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models
Bodhisattwa Prasad Majumder
Harshit Surana
Dhruv Agarwal
Bhavana Dalvi Mishra
Abhijeetsingh Meena
Aryan Prakhar
Tirth Vora
Tushar Khot
Ashish Sabharwal
Peter Clark
ELM
37
9
0
01 Jul 2024
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Philippe Laban
Alexander R. Fabbri
Caiming Xiong
Chien-Sheng Wu
RALM
38
41
0
01 Jul 2024
Face4RAG: Factual Consistency Evaluation for Retrieval Augmented Generation in Chinese
Yunqi Xu
Tianchi Cai
Jiyan Jiang
Xierui Song
33
2
0
01 Jul 2024
FineSurE: Fine-grained Summarization Evaluation using LLMs
Hwanjun Song
Hang Su
Igor Shalyminov
Jason (Jinglun) Cai
Saab Mansour
HILM
26
29
0
01 Jul 2024
PFME: A Modular Approach for Fine-grained Hallucination Detection and Editing of Large Language Models
Kunquan Deng
Zeyu Huang
Chen Li
Chenghua Lin
Min Gao
Wenge Rong
KELM
26
0
0
29 Jun 2024
From RAG to RICHES: Retrieval Interlaced with Sequence Generation
Palak Jain
Livio Baldini Soares
Tom Kwiatkowski
VLM
21
4
0
29 Jun 2024
Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification
Anisha Gunjal
Greg Durrett
HILM
44
13
0
28 Jun 2024
Scalable and Domain-General Abstractive Proposition Segmentation
Mohammad Javad Hosseini
Yang Gao
Tim Baumgärtner
Alex Fabrikant
Reinald Kim Amplayo
31
0
0
28 Jun 2024
VERISCORE: Evaluating the factuality of verifiable claims in long-form text generation
Yixiao Song
Yekyung Kim
Mohit Iyyer
HILM
32
23
0
27 Jun 2024
CaLMQA: Exploring culturally specific long-form question answering across 23 languages
Shane Arora
Marzena Karpinska
Hung-Ting Chen
Ipsita Bhattacharjee
Mohit Iyyer
Eunsol Choi
HILM
43
11
0
25 Jun 2024
Mitigating Hallucination in Fictional Character Role-Play
Nafis Sadeq
Zhouhang Xie
Byungkyu Kang
Prarit Lamba
Xiang Gao
Julian McAuley
HILM
33
6
0
25 Jun 2024
CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation
Abe Bohan Hou
Orion Weller
Guanghui Qin
Eugene Yang
Dawn J Lawrie
Nils Holzenberger
Andrew Blair-Stanek
Benjamin Van Durme
AILaw
ELM
63
5
0
24 Jun 2024
One Thousand and One Pairs: A "novel" challenge for long-context language models
Marzena Karpinska
Katherine Thai
Kyle Lo
Tanya Goyal
Mohit Iyyer
LRM
36
40
0
24 Jun 2024
Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization
Cheng-Yu Hsieh
Yung-Sung Chuang
Chun-Liang Li
Zifeng Wang
Long T. Le
...
James R. Glass
Alexander Ratner
Chen-Yu Lee
Ranjay Krishna
Tomas Pfister
40
30
0
23 Jun 2024
Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs
Jannik Kossen
Jiatong Han
Muhammed Razzak
Lisa Schut
Shreshth A. Malik
Yarin Gal
HILM
46
33
0
22 Jun 2024
MU-Bench: A Multitask Multimodal Benchmark for Machine Unlearning
Jiali Cheng
Hadi Amiri
BDL
33
3
0
21 Jun 2024
Factual Dialogue Summarization via Learning from Large Language Models
Rongxin Zhu
Jey Han Lau
Jianzhong Qi
HILM
46
1
0
20 Jun 2024
An Analysis of Multilingual FActScore
Kim Trong Vu
Michael Krumdick
Varshini Reddy
Franck Dernoncourt
Viet Dac Lai
HILM
39
0
0
20 Jun 2024
PostMark: A Robust Blackbox Watermark for Large Language Models
Yapei Chang
Kalpesh Krishna
Amir Houmansadr
John Wieting
Mohit Iyyer
34
5
0
20 Jun 2024
Selected Languages are All You Need for Cross-lingual Truthfulness Transfer
Weihao Liu
Ning Wu
Wenbiao Ding
Shining Liang
Ming Gong
Dongmei Zhang
HILM
32
0
0
20 Jun 2024
WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from Wikipedia
Yufang Hou
Alessandra Pascale
Javier Carnerero-Cano
T. Tchrakian
Radu Marinescu
Elizabeth M. Daly
Inkit Padhi
P. Sattigeri
34
6
0
19 Jun 2024
Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation
Di Wu
Jia-Chen Gu
Fan Yin
Nanyun Peng
Kai-Wei Chang
HILM
53
1
0
19 Jun 2024
Finding Blind Spots in Evaluator LLMs with Interpretable Checklists
Sumanth Doddapaneni
Mohammed Safi Ur Rahman Khan
Sshubam Verma
Mitesh Khapra
34
11
0
19 Jun 2024
Do Multimodal Foundation Models Understand Enterprise Workflows? A Benchmark for Business Process Management Tasks
Michael Wornow
A. Narayan
Ben T Viggiano
Ishan S. Khare
Tathagat Verma
...
Joshua Martinez
Vardhan Agrawal
Althea Hudson
N. Shah
Christopher Ré
35
4
0
19 Jun 2024
Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language Models
Akchay Srivastava
Atif Memon
ELM
34
1
0
19 Jun 2024
Learning to Generate Answers with Citations via Factual Consistency Models
Rami Aly
Zhiqiang Tang
Samson Tan
George Karypis
HILM
34
4
0
19 Jun 2024
Estimating Knowledge in Large Language Models Without Generating a Single Token
Daniela Gottesman
Mor Geva
37
10
0
18 Jun 2024
Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for Large Language Models
Hongbang Yuan
Yubo Chen
Pengfei Cao
Zhuoran Jin
Kang Liu
Jun Zhao
36
0
0
18 Jun 2024
Satyrn: A Platform for Analytics Augmented Generation
Marko Sterbentz
Cameron Barrie
Shubham Shahi
Abhratanu Dutta
Donna Hooshmand
Harper Pack
Kristian J. Hammond
27
0
0
17 Jun 2024
Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector
Xiaoxue Cheng
Junyi Li
Wayne Xin Zhao
Hongzhi Zhang
Fuzheng Zhang
Di Zhang
Kun Gai
Ji-Rong Wen
HILM
LLMAG
29
7
0
17 Jun 2024
Self-training Large Language Models through Knowledge Detection
Wei Jie Yeo
Teddy Ferdinan
Przemyslaw Kazienko
Ranjan Satapathy
Erik Cambria
41
9
0
17 Jun 2024
Aligning Large Language Models from Self-Reference AI Feedback with one General Principle
Rong Bao
Rui Zheng
Shihan Dou
Xiao Wang
Enyu Zhou
Bo Wang
Qi Zhang
Liang Ding
Dacheng Tao
ALM
48
0
0
17 Jun 2024
Large language model validity via enhanced conformal prediction methods
John J. Cherian
Isaac Gibbs
Emmanuel J. Candès
26
21
0
14 Jun 2024
DR-RAG: Applying Dynamic Document Relevance to Retrieval-Augmented Generation for Question-Answering
Zijian Hei
Weiling Liu
Wenjie Ou
Juyi Qiao
Junming Jiao
Guowen Song
Ting Tian
Yi Lin
RALM
38
5
0
11 Jun 2024
HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation
Wen Luo
Tianshu Shen
Wei Li
Guangyue Peng
Richeng Xuan
Houfeng Wang
Xi Yang
HILM
26
10
0
11 Jun 2024
Post-Hoc Answer Attribution for Grounded and Trustworthy Long Document Comprehension: Task, Insights, and Challenges
Abhilasha Sancheti
Koustava Goswami
Balaji Vasan Srinivasan
RALM
23
1
0
11 Jun 2024
A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation
Bairu Hou
Yang Zhang
Jacob Andreas
Shiyu Chang
69
5
0
11 Jun 2024
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
Joongwon Kim
Bhargavi Paranjape
Tushar Khot
Hannaneh Hajishirzi
LM&Ro
ELM
LLMAG
LRM
34
8
0
10 Jun 2024
Verifiable Generation with Subsentence-Level Fine-Grained Citations
Shuyang Cao
Lu Wang
19
6
0
10 Jun 2024
Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation
Neeraj Varshney
Satyam Raj
Venkatesh Mishra
Agneet Chatterjee
Ritika Sarkar
Amir Saeidi
Chitta Baral
LRM
31
7
0
08 Jun 2024
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Bill Yuchen Lin
Yuntian Deng
Khyathi Raghavi Chandu
Faeze Brahman
Abhilasha Ravichander
Valentina Pyatkin
Nouha Dziri
Ronan Le Bras
Yejin Choi
35
65
0
07 Jun 2024
MAIRA-2: Grounded Radiology Report Generation
Shruthi Bannur
Kenza Bouzid
Daniel Coelho De Castro
Anton Schwaighofer
Sam Bond-Taylor
...
Anja Thieme
M. Lungren
Maria T. A. Wetscherek
Javier Alvarez-Valle
Stephanie L. Hyland
40
33
0
06 Jun 2024
PaCE: Parsimonious Concept Engineering for Large Language Models
Jinqi Luo
Tianjiao Ding
Kwan Ho Ryan Chan
D. Thaker
Aditya Chattopadhyay
Chris Callison-Burch
René Vidal
CVBM
35
7
0
06 Jun 2024
AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways
Zehang Deng
Yongjian Guo
Changzhou Han
Wanlun Ma
Junwu Xiong
Sheng Wen
Yang Xiang
42
22
0
04 Jun 2024
Safeguarding Large Language Models: A Survey
Yi Dong
Ronghui Mu
Yanghao Zhang
Siqi Sun
Tianle Zhang
...
Yi Qi
Jinwei Hu
Jie Meng
Saddek Bensalem
Xiaowei Huang
OffRL
KELM
AILaw
35
17
0
03 Jun 2024
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs
Ryo Kamoi
Yusen Zhang
Nan Zhang
Jiawei Han
Rui Zhang
LRM
40
57
0
03 Jun 2024
Previous
1
2
3
4
5
6
...
8
9
10
Next