Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.04307
Cited By
HaluEval-Wild: Evaluating Hallucinations of Language Models in the Wild
7 March 2024
Zhiying Zhu
Yiming Yang
Zhiqing Sun
HILM
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HaluEval-Wild: Evaluating Hallucinations of Language Models in the Wild"
13 / 13 papers shown
Title
Real-World Gaps in AI Governance Research
Ilan Strauss
Isobel Moure
Tim O'Reilly
Sruly Rosenblat
54
0
0
30 Apr 2025
OAEI-LLM-T: A TBox Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching
Zhangcheng Qiang
Kerry Taylor
Weiqing Wang
Jing Jiang
52
0
0
25 Mar 2025
EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants
Franck Cappello
Sandeep Madireddy
Robert Underwood
N. Getty
Nicholas Chia
...
M. Rafique
Eliu A. Huerta
B. Li
Ian Foster
Rick L. Stevens
66
1
0
27 Feb 2025
The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input
Alon Jacovi
Andrew Wang
Chris Alberti
Connie Tao
Jon Lipovetz
...
Rachana Fellinger
Rui Wang
Zizhao Zhang
Sasha Goldshtein
Dipanjan Das
HILM
ALM
77
11
0
06 Jan 2025
WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Wenting Zhao
Tanya Goyal
Yu Ying Chiu
Liwei Jiang
Benjamin Newman
...
Khyathi Raghavi Chandu
Ronan Le Bras
Claire Cardie
Yuntian Deng
Yejin Choi
HILM
25
7
0
24 Jul 2024
From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty
Maor Ivgi
Ori Yoran
Jonathan Berant
Mor Geva
HILM
44
8
0
08 Jul 2024
Enhancing Hallucination Detection through Perturbation-Based Synthetic Data Generation in System Responses
Dongxu Zhang
Varun Gangal
B. Lattimer
Yi Yang
27
6
0
07 Jul 2024
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
Khyathi Raghavi Chandu
Linjie Li
Anas Awadalla
Ximing Lu
Jae Sung Park
Jack Hessel
Lijuan Wang
Yejin Choi
36
2
0
02 Jul 2024
DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation
A. B. M. A. Rahman
Saeed Anwar
Muhammad Usman
Ajmal Mian
HILM
25
0
0
13 Jun 2024
CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks
Maciej Besta
Lorenzo Paleari
Aleš Kubíček
Piotr Nyczyk
Robert Gerstenberger
Patrick Iff
Tomasz Lehmann
H. Niewiadomski
Torsten Hoefler
52
5
0
04 Jun 2024
AI Governance and Accountability: An Analysis of Anthropic's Claude
Aman Priyanshu
Yash Maurya
Zuofei Hong
18
3
0
02 May 2024
Detecting and Mitigating Hallucinations in Multilingual Summarisation
Yifu Qiu
Yftah Ziser
Anna Korhonen
E. Ponti
Shay B. Cohen
HILM
49
42
0
23 May 2023
The Internal State of an LLM Knows When It's Lying
A. Azaria
Tom Michael Mitchell
HILM
210
297
0
26 Apr 2023
1