(Im)possibility of Automated Hallucination Detection in Large Language Models
- HILM
Main:14 Pages
Bibliography:3 Pages
Appendix:3 Pages
Abstract
Is automated hallucination detection possible? In this work, we introduce a theoretical framework to analyze the feasibility of automatically detecting hallucinations produced by large language models (LLMs). Inspired by the classical Gold-Angluin framework for language identification and its recent adaptation to language generation by Kleinberg and Mullainathan, we investigate whether an algorithm, trained on examples drawn from an unknown target language (selected from a countable collection) and given access to an LLM, can reliably determine whether the LLM's outputs are correct or constitute hallucinations.
View on arXivComments on this paper
