Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.06498
Cited By
A New Benchmark and Reverse Validation Method for Passage-level Hallucination Detection
10 October 2023
Shiping Yang
Renliang Sun
Xiao-Yi Wan
HILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A New Benchmark and Reverse Validation Method for Passage-level Hallucination Detection"
39 / 39 papers shown
Title
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations
Ziwei Ji
L. Yu
Yeskendir Koishekenov
Yejin Bang
Anthony Hartshorn
Alan Schelten
Cheng Zhang
Pascale Fung
Nicola Cancedda
41
1
0
18 Mar 2025
Benchmarking Chinese Medical LLMs: A Medbench-based Analysis of Performance Gaps and Hierarchical Optimization Strategies
Luyi Jiang
J. Chen
Lu Lu
Xinwei Peng
Lihao Liu
Junjun He
Jie Xu
ELM
LM&MA
28
0
0
10 Mar 2025
Quantifying the Robustness of Retrieval-Augmented Language Models Against Spurious Features in Grounding Data
Shiping Yang
Jie Wu
Wenbiao Ding
Ning Wu
Shining Liang
Ming Gong
Hengyuan Zhang
Dongmei Zhang
AAML
64
1
0
07 Mar 2025
How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild
Saad Obaid ul Islam
Anne Lauscher
Goran Glavas
HILM
LRM
108
1
0
21 Feb 2025
Time-Reversal Provides Unsupervised Feedback to LLMs
Yerram Varun
Rahul Madhavan
Sravanti Addepalli
A. Suggala
Karthikeyan Shanmugam
Prateek Jain
LRM
SyDa
64
0
0
03 Dec 2024
SMoA: Improving Multi-agent Large Language Models with Sparse Mixture-of-Agents
Dawei Li
Zhen Tan
Peijia Qian
Yifan Li
Kumar Satvik Chaudhary
Lijie Hu
Jiayi Shen
40
6
0
05 Nov 2024
Controlled Automatic Task-Specific Synthetic Data Generation for Hallucination Detection
Yong Xie
Karan Aggarwal
Aitzaz Ahmad
Stephen Lau
30
0
0
16 Oct 2024
Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling
Xinyue Fang
Zhen Huang
Zhiliang Tian
Minghui Fang
Ziyi Pan
Quntian Fang
Zhihua Wen
Hengyue Pan
Dongsheng Li
HILM
80
2
0
17 Sep 2024
Fostering Natural Conversation in Large Language Models with NICO: a Natural Interactive COnversation dataset
Renliang Sun
Mengyuan Liu
Shiping Yang
Rui Wang
Junqing He
Jiaxing Zhang
23
2
0
18 Aug 2024
ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models
Yuzhe Gu
Ziwei Ji
Wenwei Zhang
Chengqi Lyu
Dahua Lin
Kai Chen
HILM
22
5
0
05 Jul 2024
DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation
A. B. M. A. Rahman
Saeed Anwar
Muhammad Usman
Ajmal Mian
HILM
22
0
0
13 Jun 2024
HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation
Wen Luo
Tianshu Shen
Wei Li
Guangyue Peng
Richeng Xuan
Houfeng Wang
Xi Yang
HILM
21
4
0
11 Jun 2024
A Survey of Useful LLM Evaluation
Ji-Lun Peng
Sijia Cheng
Egil Diau
Yung-Yu Shih
Po-Heng Chen
Yen-Ting Lin
Yun-Nung Chen
LLMAG
ELM
24
12
0
03 Jun 2024
ANAH: Analytical Annotation of Hallucinations in Large Language Models
Ziwei Ji
Yuzhe Gu
Wenwei Zhang
Chengqi Lyu
Dahua Lin
Kai-xiang Chen
HILM
35
2
0
30 May 2024
RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models
Xiangkun Hu
Dongyu Ru
Lin Qiu
Qipeng Guo
Tianhang Zhang
Yang Xu
Yun Luo
Pengfei Liu
Yue Zhang
Zheng-Wei Zhang
HILM
LRM
41
8
0
23 May 2024
Evaluating Consistency and Reasoning Capabilities of Large Language Models
Yash Saxena
Sarthak Chopra
Arunendra Mani Tripathi
ELM
LRM
28
5
0
25 Apr 2024
Fake Artificial Intelligence Generated Contents (FAIGC): A Survey of Theories, Detection Methods, and Opportunities
Xiaomin Yu
Yezhaohui Wang
Yanfang Chen
Zhen Tao
Dinghao Xi
Shichao Song
Simin Niu
Zhiyu Li
62
7
0
25 Apr 2024
Can We Catch the Elephant? A Survey of the Evolvement of Hallucination Evaluation on Natural Language Generation
Siya Qi
Yulan He
Zheng Yuan
LRM
HILM
25
1
0
18 Apr 2024
Less is More for Improving Automatic Evaluation of Factual Consistency
Tong Wang
Ninad Kulkarni
Yanjun Qi
ALM
30
2
0
09 Apr 2024
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics
Derui Zhu
Dingfan Chen
Qing Li
Zongxiong Chen
Lei Ma
Jens Grossklags
Mario Fritz
HILM
29
3
0
06 Apr 2024
FACTOID: FACtual enTailment fOr hallucInation Detection
Vipula Rawte
S. M. Towhidul
Krishnav Rajbangshi
Shravani Nag
Aman Chadha
Amit P. Sheth
Amitava Das
HILM
22
3
0
28 Mar 2024
ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models
Jio Oh
Soyeon Kim
Junseok Seo
Jindong Wang
Ruochen Xu
Xing Xie
Steven Euijong Whang
25
1
0
08 Mar 2024
DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language Models
Kedi Chen
Qin Chen
Jie Zhou
Yishen He
Liang He
HILM
25
1
0
01 Mar 2024
Whispers that Shake Foundations: Analyzing and Mitigating False Premise Hallucinations in Large Language Models
Hongbang Yuan
Pengfei Cao
Zhuoran Jin
Yubo Chen
Daojian Zeng
Kang Liu
Jun Zhao
HILM
19
3
0
29 Feb 2024
HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs
Cem Uluoglakci
T. Taşkaya-Temizel
HILM
17
2
0
25 Feb 2024
Large Language Models for Data Annotation: A Survey
Zhen Tan
Dawei Li
Song Wang
Alimohammad Beigi
Bohan Jiang
Amrita Bhattacharjee
Mansooreh Karami
Jundong Li
Lu Cheng
Huan Liu
SyDa
37
44
0
21 Feb 2024
LLMAuditor: A Framework for Auditing Large Language Models Using Human-in-the-Loop
Maryam Amirizaniani
Jihan Yao
Adrian Lavergne
Elizabeth Snell Okada
Aman Chadha
Tanya Roosta
Chirag Shah
HILM
13
1
0
14 Feb 2024
AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe Approach
Maryam Amirizaniani
Elias Martin
Tanya Roosta
Aman Chadha
Chirag Shah
13
2
0
14 Feb 2024
Contextualization Distillation from Large Language Model for Knowledge Graph Completion
Dawei Li
Zhen Tan
Tianlong Chen
Huan Liu
KELM
17
12
0
28 Jan 2024
HALO: An Ontology for Representing and Categorizing Hallucinations in Large Language Models
Navapat Nananukul
M. Kejriwal
HILM
21
3
0
08 Dec 2023
UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation
Xun Liang
Shichao Song
Simin Niu
Zhiyu Li
Feiyu Xiong
...
Zhaohui Wy
Dawei He
Peng Cheng
Zhonghao Wang
Haiying Deng
HILM
21
17
0
26 Nov 2023
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
Lei Huang
Weijiang Yu
Weitao Ma
Weihong Zhong
Zhangyin Feng
...
Qianglong Chen
Weihua Peng
Xiaocheng Feng
Bing Qin
Ting Liu
LRM
HILM
31
684
0
09 Nov 2023
SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency
Jiaxin Zhang
Zhuohang Li
Kamalika Das
Bradley Malin
Kumar Sricharan
HILM
LRM
12
56
0
03 Nov 2023
Multi-level Contrastive Learning for Script-based Character Understanding
Dawei Li
Hengyuan Zhang
Yanran Li
Shiping Yang
35
17
0
20 Oct 2023
Cognitive Mirage: A Review of Hallucinations in Large Language Models
Hongbin Ye
Tong Liu
Aijia Zhang
Wei Hua
Weiqiang Jia
HILM
16
76
0
13 Sep 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
197
2,232
0
22 Mar 2023
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Potsawee Manakul
Adian Liusie
Mark J. F. Gales
HILM
LRM
145
386
0
15 Mar 2023
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark
Nouha Dziri
Hannah Rashkin
Tal Linzen
David Reitter
ALM
185
79
0
30 Apr 2021
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
393
2,216
0
03 Sep 2019
1