v1v2v3v4 (latest)

LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

International Conference on Learning Representations (ICLR), 2024

3 October 2024

ArXiv (abs)PDF HTML HuggingFace (49 upvotes)

Papers citing "LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations"

50 / 131 papers shown

Title
The Illusion of Certainty: Uncertainty quantification for LLMs fails under ambiguity Tim Tomov Dominik Fuchsgruber Tom Wollschlager Stephan Günnemann 116 0 0 06 Nov 2025
RepV: Safety-Separable Latent Spaces for Scalable Neurosymbolic Plan Verification Yunhao Yang N. Bhatt Pranay Samineni Rohan Siva Zhanyang Wang Ufuk Topcu 117 0 0 30 Oct 2025
HACK: Hallucinations Along Certainty and Knowledge Axes Adi Simhi Jonathan Herzig Itay Itzhak Dana Arad Zorik Gekhman Roi Reichart Fazl Barez Gabriel Stanovsky Idan Szpektor Yonatan Belinkov 160 0 0 28 Oct 2025
Do Stop Me Now: Detecting Boilerplate Responses with a Single Iteration Yuval Kainan Shaked Zychlinski 104 0 0 26 Oct 2025
Mixture-of-Minds: Multi-Agent Reinforcement Learning for Table Understanding Yuhang Zhou Mingrui Zhang Ke Li Mingyi Wang Qiao Liu ... Mingze Gao Abhishek Kumar Xiangjun Fan Zhuokai Zhao Lizhu Zhang LLMAG LRM 159 0 0 23 Oct 2025
CARES: Context-Aware Resolution Selector for VLMs Moshe Kimhi Nimrod Shabtay Raja Giryes Chaim Baskin Eli Schwartz VLM 100 0 0 22 Oct 2025
Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations Tong Chen Akari Asai Luke Zettlemoyer Hannaneh Hajishirzi Faeze Brahman OffRL HILM LRM 177 0 0 20 Oct 2025
Emergence of Linear Truth Encodings in Language Models Shauli Ravfogel Gilad Yehudai Tal Linzen Joan Bruna A. Bietti KELM 124 1 0 17 Oct 2025
LLM Knowledge is Brittle: Truthfulness Representations Rely on Superficial Resemblance Patrick Haller Mark Ibrahim Polina Kirichenko Levent Sagun Samuel J. Bell KELM 94 0 0 13 Oct 2025
Large Language Models Do NOT Really Know What They Don't Know C. Cheang Hou Pong Chan Wenxuan Zhang Yang Deng HILM 152 0 0 10 Oct 2025
Weak Form Learning for Mean-Field Partial Differential Equations: an Application to Insect Movement Seth Minor Bret D. Elderd Benjamin Van Allen David M. Bortz Vanja M. Dukic 108 0 0 09 Oct 2025
LLM Microscope: What Model Internals Reveal About Answer Correctness and Context Utilization Jiarui Liu Jivitesh Jain Mona T. Diab Nishant Subramani 125 0 0 05 Oct 2025
Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViT Guy Bar-Shalom Fabrizio Frasca Yaniv Galron Yftah Ziser Haggai Maron MLLM 115 0 0 30 Sep 2025
TraceDet: Hallucination Detection from the Decoding Trace of Diffusion Large Language Models Shenxu Chang Junchi Yu Weixing Wang Yongqiang Chen Jialin Yu Philip Torr Jindong Gu HILM 108 0 0 30 Sep 2025
Neural Message-Passing on Attention Graphs for Hallucination Detection Fabrizio Frasca Guy Bar-Shalom Yftah Ziser Haggai Maron 100 0 0 29 Sep 2025
Reference-Free Rating of LLM Responses via Latent Information Leander Girrbach Chi-Ping Su Tankred Saanum Richard Socher Eric Schulz Zeynep Akata 112 0 0 29 Sep 2025
Bridging the Knowledge-Prediction Gap in LLMs on Multiple-Choice Questions Yoonah Park Haesung Pyun Yohan Jo KELM 204 0 0 28 Sep 2025
Estimating Semantic Alphabet Size for LLM Uncertainty Quantification Lucas H. McCabe Rimon Melamed Thomas Hartvigsen H. H. Huang 106 0 0 17 Sep 2025
Decoding Memories: An Efficient Pipeline for Self-Consistency Hallucination Detection Weizhi Gao Xiaorui Liu Feiyi Wang Dan Lu Junqi Yin HILM 76 0 0 28 Aug 2025
Real-Time Detection of Hallucinated Entities in Long-Form Generation Oscar Obeso Andy Arditi Javier Ferrando Joshua Freeman Cameron Holmes Neel Nanda HILM 157 6 0 26 Aug 2025
Answering the Unanswerable Is to Err Knowingly: Analyzing and Mitigating Abstention Failures in Large Reasoning Models Yi Liu Xiangyu Liu Zequn Sun Wei Hu 68 1 0 26 Aug 2025
Trustworthy Agents for Electronic Health Records through Confidence Estimation Yongwoo Song Minbyul Jeong Mujeen Sung HILM 68 0 0 26 Aug 2025
Beyond Transcription: Mechanistic Interpretability in ASR Neta Glazer Yael Segal-Feldman Hilit Segev Aviv Shamsian Asaf Buchnick Gill Hetz Ethan Fetaya Joseph Keshet Aviv Navon 92 0 0 21 Aug 2025
Prompt-Induced Linguistic Fingerprints for LLM-Generated Fake News Detection Chi Wang Min Gao Zongwei Wang Junwei Yin Kai Shu Chenghua Lin DeLMO 124 0 0 18 Aug 2025
Can LLMs Detect Their Confabulations? Estimating Reliability in Uncertainty-Aware Language Models Tianyi Zhou Johanne Medina Sanjay Chawla HILM 136 1 0 11 Aug 2025
The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs Denis Janiak Jakub Binkowski Albert Sawczyn Bogdan Gabrys Ravid Schwartz-Ziv Tomasz Kajdanowicz HILM 192 4 0 01 Aug 2025
HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs Zhaolin Cai Fan Li Ziwei Zheng Yanjun Qin 124 1 0 23 Jul 2025
ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Zhenliang Zhang Xinyu Hu Huixuan Zhang Junzhe Zhang Xiaojun Wan HILM 253 2 0 22 Jul 2025
Extracting Visual Facts from Intermediate Layers for Mitigating Hallucinations in Multimodal Large Language Models Haoran Zhou Zihan Zhang Hao Chen 111 0 0 21 Jul 2025
Large Language Models Encode Semantics in Low-Dimensional Linear Subspaces Baturay Saglam Paul Kassianik Blaine Nelson Sajana Weerawardhena Yaron Singer Amin Karbasi 115 2 0 13 Jul 2025
Persona Features Control Emergent Misalignment Miles Wang Tom Dupré la Tour Olivia Watkins Alex Makelov Ryan A. Chi ... Jeffrey Wang Achyuta Rajaram Johannes Heidecke Tejal Patwardhan Dan Mossing 222 14 0 24 Jun 2025
The Geometries of Truth Are Orthogonal Across Tasks Waiss Azizian Michael Kirchhof Eugène Ndiaye Louis Béthune Stephen Zhang Pierre Ablin Marco Cuturi 186 0 0 10 Jun 2025
CLATTER: Comprehensive Entailment Reasoning for Hallucination Detection Ron Eliav Arie Cattan Eran Hirsch Shahaf Bassan Elias Stengel-Eskin Mohit Bansal Ido Dagan LRM 276 3 0 05 Jun 2025
Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety Seongmin Lee Aeree Cho Grace C. Kim ShengYun Peng Mansi Phute Duen Horng Chau LM&MA AI4CE 257 3 0 05 Jun 2025
Growing Through Experience: Scaling Episodic Grounding in Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Chunhui Zhang Sirui Wang Z. Ouyang Xiangchi Yuan Soroush Vosoughi CLL 196 5 0 02 Jun 2025
HD-NDEs: Neural Differential Equations for Hallucination Detection in LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Qing Li Fauzan Farooqui Zongxiong Chen Derui Zhu Yuxia Wang Congbo Ma Chenyang Lyu Fakhri Karray 218 2 0 30 May 2025
Whose Name Comes Up? Auditing LLM-Based Scholar Recommendations Daniele Barolo Chiara Valentin Fariba Karimi Luis Galárraga Gonzalo G. Méndez Lisette Espín-Noboa 228 0 0 29 May 2025
How Does Response Length Affect Long-Form FactualityAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 James Xu Zhao Jimmy Z.J. Liu Bryan Hooi See-Kiong Ng HILM KELM 208 3 0 29 May 2025
Data-efficient Meta-models for Evaluation of Context-based Questions and Answers in LLMsAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025 Julia Belikova Konstantin Polev Rauf Parchiev Dmitry Simakov 152 0 0 29 May 2025
Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs Hexiang Tan Fei Sun Sha Liu Du Su Qi Cao ... Jingang Wang Xunliang Cai Yuanzhuo Wang Huawei Shen Xueqi Cheng HILM 460 1 0 23 May 2025
When Do LLMs Admit Their Mistakes? Understanding the Role of Model Belief in Retraction Yuqing Yang Robin Jia KELM LRM 312 2 0 22 May 2025
RePPL: Recalibrating Perplexity by Uncertainty in Semantic Propagation and Language Generation for Explainable QA Hallucination Detection Yiming Huang Junyan Zhang Zihao Wang Biquan Bie Xuming Hu Yi R. Fung 291 0 0 21 May 2025
Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs Hao Wang Pinzhi Huang Jihan Yang Saining Xie Daisuke Kawahara 427 1 0 21 May 2025
HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving Zhiwen Chen Bo Leng Zhuoren Li Hanming Deng Guizhe Jin Ran Yu Huanxi Wen 525 2 0 21 May 2025
Void in Language Models Mani Shemiranifar 208 1 0 20 May 2025
Truth Neurons Haohang Li Yun Feng Yangyang Yu Jordan W. Suchow Zining Zhu HILM MILM KELM 392 0 0 18 May 2025
Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors Jing Huang Junyi Tao Thomas Icard Diyi Yang Christopher Potts OODD 390 3 0 17 May 2025
Revealing economic facts: LLMs know more than they say Marcus Buckmann Quynh Anh Nguyen Edward Hill 262 3 0 13 May 2025
Investigating task-specific prompts and sparse autoencoders for activation monitoring Henk Tillman Dan Mossing LLMSV 275 10 0 28 Apr 2025
The Geometry of Self-Verification in a Task-Specific Reasoning Model Andrew Lee Lihao Sun Chris Wendler Fernanda Viégas Martin Wattenberg LRM 395 3 0 19 Apr 2025