306

(Im)possibility of Automated Hallucination Detection in Large Language Models

Main:14 Pages
Bibliography:3 Pages
Appendix:3 Pages
Abstract

Is automated hallucination detection possible? In this work, we introduce a theoretical framework to analyze the feasibility of automatically detecting hallucinations produced by large language models (LLMs). Inspired by the classical Gold-Angluin framework for language identification and its recent adaptation to language generation by Kleinberg and Mullainathan, we investigate whether an algorithm, trained on examples drawn from an unknown target language KK (selected from a countable collection) and given access to an LLM, can reliably determine whether the LLM's outputs are correct or constitute hallucinations.

View on arXiv
Comments on this paper