LLM Internal States Reveal Hallucination Risk Faced With a Query

LLM Internal States Reveal Hallucination Risk Faced With a Query

3 July 2024

Delong Chen

Samuel Cahyawijaya

Papers citing "LLM Internal States Reveal Hallucination Risk Faced With a Query"

19 / 19 papers shown

Title
Hallucination Detection using Multi-View Attention Features Yuya Ogasa Yuki Arase 26 0 0 06 Apr 2025
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations Ziwei Ji L. Yu Yeskendir Koishekenov Yejin Bang Anthony Hartshorn Alan Schelten Cheng Zhang Pascale Fung Nicola Cancedda 46 1 0 18 Mar 2025
High-Dimensional Interlingual Representations of Large Language Models Bryan Wilie Samuel Cahyawijaya Junxian He Pascale Fung 50 0 0 14 Mar 2025
HalluCounter: Reference-free LLM Hallucination Detection in the Wild! Ashok Urlana Gopichand Kanumolu Charaka Vinayak Kumar B. Garlapati Rahul Mishra HILM 56 0 0 06 Mar 2025
Semantic Volume: Quantifying and Detecting both External and Internal Uncertainty in LLMs Xiaomin Li Zhou Yu Ziji Zhang Yingying Zhuang S. Narayanan Sadagopan Anurag Beniwal HILM 58 0 0 28 Feb 2025
What are Models Thinking about? Understanding Large Language Model Hallucinations "Psychology" through Model Inner State Analysis Peiran Wang Yang Liu Yunfei Lu Jue Hong Ye Wu HILM LRM 67 0 0 20 Feb 2025
The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking Yuchun Miao Sen Zhang Liang Ding Yuqi Zhang L. Zhang Dacheng Tao 81 3 0 31 Jan 2025
UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models Boyang Xue Fei Mi Qi Zhu Hongru Wang Rui Wang Sheng Wang Erxin Yu Xuming Hu Kam-Fai Wong HILM 69 0 0 16 Dec 2024
Distinguishing Ignorance from Error in LLM Hallucinations Adi Simhi Jonathan Herzig Idan Szpektor Yonatan Belinkov HILM 53 2 0 29 Oct 2024
A Survey on the Honesty of Large Language Models Siheng Li Cheng Yang Taiqiang Wu Chufan Shi Yuji Zhang ... Jie Zhou Yujiu Yang Ngai Wong Xixin Wu Wai Lam HILM 27 4 0 27 Sep 2024
Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing Wenhao Liu Siyu An Junru Lu Muling Wu Tianlong Li Xiaohua Wang Xiaoqing Zheng Di Yin Xing Sun Xuanjing Huang 31 0 0 25 Sep 2024
Attention Heads of Large Language Models: A Survey Zifan Zheng Yezhaohui Wang Yuxin Huang Shichao Song Mingchuan Yang Bo Tang Feiyu Xiong Zhiyu Li LRM 48 21 0 05 Sep 2024
Knowledge Mechanisms in Large Language Models: A Survey and Perspective Meng Wang Yunzhi Yao Ziwen Xu Shuofei Qiao Shumin Deng ... Yong-jia Jiang Pengjun Xie Fei Huang Huajun Chen Ningyu Zhang 47 27 0 22 Jul 2024
Into the Unknown: Self-Learning Large Language Models Teddy Ferdinan Jan Kocoñ P. Kazienko 20 2 0 14 Feb 2024
Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness? Kevin Liu Stephen Casper Dylan Hadfield-Menell Jacob Andreas HILM 52 35 0 27 Nov 2023
Uncertainty Quantification with Pre-trained Language Models: A Large-Scale Empirical Analysis Yuxin Xiao Paul Pu Liang Umang Bhatt W. Neiswanger Ruslan Salakhutdinov Louis-Philippe Morency 167 86 0 10 Oct 2022
Generalized Out-of-Distribution Detection: A Survey Jingkang Yang Kaiyang Zhou Yixuan Li Ziwei Liu 171 870 0 21 Oct 2021
Probing Classifiers: Promises, Shortcomings, and Advances Yonatan Belinkov 221 402 0 24 Feb 2021
Extracting Training Data from Large Language Models Nicholas Carlini Florian Tramèr Eric Wallace Matthew Jagielski Ariel Herbert-Voss ... Tom B. Brown D. Song Ulfar Erlingsson Alina Oprea Colin Raffel MLAU SILM 267 1,798 0 14 Dec 2020