(Im)possibility of Automated Hallucination Detection in Large Language Models

23 April 2025

Amin Karbasi

Main:14 Pages

Bibliography:3 Pages

Appendix:3 Pages

Abstract

Is automated hallucination detection possible? In this work, we introduce a theoretical framework to analyze the feasibility of automatically detecting hallucinations produced by large language models (LLMs). Inspired by the classical Gold-Angluin framework for language identification and its recent adaptation to language generation by Kleinberg and Mullainathan, we investigate whether an algorithm, trained on examples drawn from an unknown target language $K$ (selected from a countable collection) and given access to an LLM, can reliably determine whether the LLM's outputs are correct or constitute hallucinations.

View on arXiv

Comments on this paper