Catch Me if You Search: When Contextual Web Search Results Affect the Detection of Hallucinations

While we increasingly rely on large language models (LLMs) for various tasks, these models are known to produce inaccurate content or `hallucinations' with potentially disastrous consequences. The recent integration of web search results into LLMs prompts the question of whether people utilize them to verify the generated content, thereby accurately detecting hallucinations. An online experiment (N = 560) investigated how the provision of search results, either static (i.e., fixed search results provided by LLM) or dynamic (i.e., participant-led searches), affects participants' perceived accuracy of LLM-generated content (i.e., genuine, minor hallucination, major hallucination), self-confidence in accuracy ratings, as well as their overall evaluation of the LLM, as compared to the control condition (i.e., no search results). Results showed that participants in both static and dynamic conditions (vs. control) rated hallucinated content to be less accurate and perceived the LLM more negatively. However, those in the dynamic condition rated genuine content as more accurate and demonstrated greater overall self-confidence in their assessments than those in the static search or control conditions. We highlighted practical implications of incorporating web search functionality into LLMs in real-world contexts.
View on arXiv@article{nahar2025_2504.01153, title={ Catch Me if You Search: When Contextual Web Search Results Affect the Detection of Hallucinations }, author={ Mahjabin Nahar and Eun-Ju Lee and Jin Won Park and Dongwon Lee }, journal={arXiv preprint arXiv:2504.01153}, year={ 2025 } }