HaluEval-Wild: Evaluating Hallucinations of Language Models in the Wild

HaluEval-Wild: Evaluating Hallucinations of Language Models in the Wild

7 March 2024

Papers citing "HaluEval-Wild: Evaluating Hallucinations of Language Models in the Wild"

13 / 13 papers shown

Title
Real-World Gaps in AI Governance Research Ilan Strauss Isobel Moure Tim O'Reilly Sruly Rosenblat 54 0 0 30 Apr 2025
OAEI-LLM-T: A TBox Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching Zhangcheng Qiang Kerry Taylor Weiqing Wang Jing Jiang 52 0 0 25 Mar 2025
EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants Franck Cappello Sandeep Madireddy Robert Underwood N. Getty Nicholas Chia ... M. Rafique Eliu A. Huerta B. Li Ian Foster Rick L. Stevens 66 1 0 27 Feb 2025
The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input Alon Jacovi Andrew Wang Chris Alberti Connie Tao Jon Lipovetz ... Rachana Fellinger Rui Wang Zizhao Zhang Sasha Goldshtein Dipanjan Das HILM ALM 77 11 0 06 Jan 2025
WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries Wenting Zhao Tanya Goyal Yu Ying Chiu Liwei Jiang Benjamin Newman ... Khyathi Raghavi Chandu Ronan Le Bras Claire Cardie Yuntian Deng Yejin Choi HILM 25 7 0 24 Jul 2024
From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty Maor Ivgi Ori Yoran Jonathan Berant Mor Geva HILM 44 8 0 08 Jul 2024
Enhancing Hallucination Detection through Perturbation-Based Synthetic Data Generation in System Responses Dongxu Zhang Varun Gangal B. Lattimer Yi Yang 27 6 0 07 Jul 2024
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness Khyathi Raghavi Chandu Linjie Li Anas Awadalla Ximing Lu Jae Sung Park Jack Hessel Lijuan Wang Yejin Choi 36 2 0 02 Jul 2024
DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation A. B. M. A. Rahman Saeed Anwar Muhammad Usman Ajmal Mian HILM 25 0 0 13 Jun 2024
CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks Maciej Besta Lorenzo Paleari Aleš Kubíček Piotr Nyczyk Robert Gerstenberger Patrick Iff Tomasz Lehmann H. Niewiadomski Torsten Hoefler 52 5 0 04 Jun 2024
AI Governance and Accountability: An Analysis of Anthropic's Claude Aman Priyanshu Yash Maurya Zuofei Hong 18 3 0 02 May 2024
Detecting and Mitigating Hallucinations in Multilingual Summarisation Yifu Qiu Yftah Ziser Anna Korhonen E. Ponti Shay B. Cohen HILM 49 42 0 23 May 2023
The Internal State of an LLM Knows When It's Lying A. Azaria Tom Michael Mitchell HILM 210 297 0 26 Apr 2023