ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.02707
  4. Cited By
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
v1v2v3v4 (latest)

LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

International Conference on Learning Representations (ICLR), 2024
3 October 2024
Hadas Orgad
Michael Toker
Zorik Gekhman
Roi Reichart
Idan Szpektor
Hadas Kotek
Yonatan Belinkov
    HILMAIFin
ArXiv (abs)PDFHTMLHuggingFace (49 upvotes)

Papers citing "LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations"

50 / 131 papers shown
Title
The Illusion of Certainty: Uncertainty quantification for LLMs fails under ambiguity
The Illusion of Certainty: Uncertainty quantification for LLMs fails under ambiguity
Tim Tomov
Dominik Fuchsgruber
Tom Wollschlager
Stephan Günnemann
116
0
0
06 Nov 2025
RepV: Safety-Separable Latent Spaces for Scalable Neurosymbolic Plan Verification
RepV: Safety-Separable Latent Spaces for Scalable Neurosymbolic Plan Verification
Yunhao Yang
N. Bhatt
Pranay Samineni
Rohan Siva
Zhanyang Wang
Ufuk Topcu
117
0
0
30 Oct 2025
HACK: Hallucinations Along Certainty and Knowledge Axes
HACK: Hallucinations Along Certainty and Knowledge Axes
Adi Simhi
Jonathan Herzig
Itay Itzhak
Dana Arad
Zorik Gekhman
Roi Reichart
Fazl Barez
Gabriel Stanovsky
Idan Szpektor
Yonatan Belinkov
160
0
0
28 Oct 2025
Do Stop Me Now: Detecting Boilerplate Responses with a Single Iteration
Do Stop Me Now: Detecting Boilerplate Responses with a Single Iteration
Yuval Kainan
Shaked Zychlinski
104
0
0
26 Oct 2025
Mixture-of-Minds: Multi-Agent Reinforcement Learning for Table Understanding
Mixture-of-Minds: Multi-Agent Reinforcement Learning for Table Understanding
Yuhang Zhou
Mingrui Zhang
Ke Li
Mingyi Wang
Qiao Liu
...
Mingze Gao
Abhishek Kumar
Xiangjun Fan
Zhuokai Zhao
Lizhu Zhang
LLMAGLRM
159
0
0
23 Oct 2025
CARES: Context-Aware Resolution Selector for VLMs
CARES: Context-Aware Resolution Selector for VLMs
Moshe Kimhi
Nimrod Shabtay
Raja Giryes
Chaim Baskin
Eli Schwartz
VLM
100
0
0
22 Oct 2025
Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations
Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations
Tong Chen
Akari Asai
Luke Zettlemoyer
Hannaneh Hajishirzi
Faeze Brahman
OffRLHILMLRM
177
0
0
20 Oct 2025
Emergence of Linear Truth Encodings in Language Models
Emergence of Linear Truth Encodings in Language Models
Shauli Ravfogel
Gilad Yehudai
Tal Linzen
Joan Bruna
A. Bietti
KELM
124
1
0
17 Oct 2025
LLM Knowledge is Brittle: Truthfulness Representations Rely on Superficial Resemblance
LLM Knowledge is Brittle: Truthfulness Representations Rely on Superficial Resemblance
Patrick Haller
Mark Ibrahim
Polina Kirichenko
Levent Sagun
Samuel J. Bell
KELM
94
0
0
13 Oct 2025
Large Language Models Do NOT Really Know What They Don't Know
Large Language Models Do NOT Really Know What They Don't Know
C. Cheang
Hou Pong Chan
Wenxuan Zhang
Yang Deng
HILM
152
0
0
10 Oct 2025
Weak Form Learning for Mean-Field Partial Differential Equations: an Application to Insect Movement
Weak Form Learning for Mean-Field Partial Differential Equations: an Application to Insect Movement
Seth Minor
Bret D. Elderd
Benjamin Van Allen
David M. Bortz
Vanja M. Dukic
108
0
0
09 Oct 2025
LLM Microscope: What Model Internals Reveal About Answer Correctness and Context Utilization
LLM Microscope: What Model Internals Reveal About Answer Correctness and Context Utilization
Jiarui Liu
Jivitesh Jain
Mona T. Diab
Nishant Subramani
125
0
0
05 Oct 2025
Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViT
Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViT
Guy Bar-Shalom
Fabrizio Frasca
Yaniv Galron
Yftah Ziser
Haggai Maron
MLLM
115
0
0
30 Sep 2025
TraceDet: Hallucination Detection from the Decoding Trace of Diffusion Large Language Models
TraceDet: Hallucination Detection from the Decoding Trace of Diffusion Large Language Models
Shenxu Chang
Junchi Yu
Weixing Wang
Yongqiang Chen
Jialin Yu
Philip Torr
Jindong Gu
HILM
108
0
0
30 Sep 2025
Neural Message-Passing on Attention Graphs for Hallucination Detection
Neural Message-Passing on Attention Graphs for Hallucination Detection
Fabrizio Frasca
Guy Bar-Shalom
Yftah Ziser
Haggai Maron
100
0
0
29 Sep 2025
Reference-Free Rating of LLM Responses via Latent Information
Reference-Free Rating of LLM Responses via Latent Information
Leander Girrbach
Chi-Ping Su
Tankred Saanum
Richard Socher
Eric Schulz
Zeynep Akata
112
0
0
29 Sep 2025
Bridging the Knowledge-Prediction Gap in LLMs on Multiple-Choice Questions
Bridging the Knowledge-Prediction Gap in LLMs on Multiple-Choice Questions
Yoonah Park
Haesung Pyun
Yohan Jo
KELM
204
0
0
28 Sep 2025
Estimating Semantic Alphabet Size for LLM Uncertainty Quantification
Estimating Semantic Alphabet Size for LLM Uncertainty Quantification
Lucas H. McCabe
Rimon Melamed
Thomas Hartvigsen
H. H. Huang
106
0
0
17 Sep 2025
Decoding Memories: An Efficient Pipeline for Self-Consistency Hallucination Detection
Decoding Memories: An Efficient Pipeline for Self-Consistency Hallucination Detection
Weizhi Gao
Xiaorui Liu
Feiyi Wang
Dan Lu
Junqi Yin
HILM
76
0
0
28 Aug 2025
Real-Time Detection of Hallucinated Entities in Long-Form Generation
Real-Time Detection of Hallucinated Entities in Long-Form Generation
Oscar Obeso
Andy Arditi
Javier Ferrando
Joshua Freeman
Cameron Holmes
Neel Nanda
HILM
157
6
0
26 Aug 2025
Answering the Unanswerable Is to Err Knowingly: Analyzing and Mitigating Abstention Failures in Large Reasoning Models
Answering the Unanswerable Is to Err Knowingly: Analyzing and Mitigating Abstention Failures in Large Reasoning Models
Yi Liu
Xiangyu Liu
Zequn Sun
Wei Hu
68
1
0
26 Aug 2025
Trustworthy Agents for Electronic Health Records through Confidence Estimation
Trustworthy Agents for Electronic Health Records through Confidence Estimation
Yongwoo Song
Minbyul Jeong
Mujeen Sung
HILM
68
0
0
26 Aug 2025
Beyond Transcription: Mechanistic Interpretability in ASR
Beyond Transcription: Mechanistic Interpretability in ASR
Neta Glazer
Yael Segal-Feldman
Hilit Segev
Aviv Shamsian
Asaf Buchnick
Gill Hetz
Ethan Fetaya
Joseph Keshet
Aviv Navon
92
0
0
21 Aug 2025
Prompt-Induced Linguistic Fingerprints for LLM-Generated Fake News Detection
Prompt-Induced Linguistic Fingerprints for LLM-Generated Fake News Detection
Chi Wang
Min Gao
Zongwei Wang
Junwei Yin
Kai Shu
Chenghua Lin
DeLMO
124
0
0
18 Aug 2025
Can LLMs Detect Their Confabulations? Estimating Reliability in Uncertainty-Aware Language Models
Can LLMs Detect Their Confabulations? Estimating Reliability in Uncertainty-Aware Language Models
Tianyi Zhou
Johanne Medina
Sanjay Chawla
HILM
136
1
0
11 Aug 2025
The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs
The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs
Denis Janiak
Jakub Binkowski
Albert Sawczyn
Bogdan Gabrys
Ravid Schwartz-Ziv
Tomasz Kajdanowicz
HILM
192
4
0
01 Aug 2025
HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs
HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs
Zhaolin Cai
Fan Li
Ziwei Zheng
Yanjun Qin
124
1
0
23 Jul 2025
ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs
ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Zhenliang Zhang
Xinyu Hu
Huixuan Zhang
Junzhe Zhang
Xiaojun Wan
HILM
253
2
0
22 Jul 2025
Extracting Visual Facts from Intermediate Layers for Mitigating Hallucinations in Multimodal Large Language Models
Extracting Visual Facts from Intermediate Layers for Mitigating Hallucinations in Multimodal Large Language Models
Haoran Zhou
Zihan Zhang
Hao Chen
111
0
0
21 Jul 2025
Large Language Models Encode Semantics in Low-Dimensional Linear Subspaces
Large Language Models Encode Semantics in Low-Dimensional Linear Subspaces
Baturay Saglam
Paul Kassianik
Blaine Nelson
Sajana Weerawardhena
Yaron Singer
Amin Karbasi
115
2
0
13 Jul 2025
Persona Features Control Emergent Misalignment
Persona Features Control Emergent Misalignment
Miles Wang
Tom Dupré la Tour
Olivia Watkins
Alex Makelov
Ryan A. Chi
...
Jeffrey Wang
Achyuta Rajaram
Johannes Heidecke
Tejal Patwardhan
Dan Mossing
222
14
0
24 Jun 2025
The Geometries of Truth Are Orthogonal Across Tasks
Waiss Azizian
Michael Kirchhof
Eugène Ndiaye
Louis Béthune
Stephen Zhang
Pierre Ablin
Marco Cuturi
186
0
0
10 Jun 2025
CLATTER: Comprehensive Entailment Reasoning for Hallucination Detection
Ron Eliav
Arie Cattan
Eran Hirsch
Shahaf Bassan
Elias Stengel-Eskin
Mohit Bansal
Ido Dagan
LRM
276
3
0
05 Jun 2025
Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety
Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety
Seongmin Lee
Aeree Cho
Grace C. Kim
ShengYun Peng
Mansi Phute
Duen Horng Chau
LM&MAAI4CE
257
3
0
05 Jun 2025
Growing Through Experience: Scaling Episodic Grounding in Language Models
Growing Through Experience: Scaling Episodic Grounding in Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Chunhui Zhang
Sirui
Wang
Z. Ouyang
Xiangchi Yuan
Soroush Vosoughi
CLL
196
5
0
02 Jun 2025
HD-NDEs: Neural Differential Equations for Hallucination Detection in LLMs
HD-NDEs: Neural Differential Equations for Hallucination Detection in LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Qing Li
Fauzan Farooqui
Zongxiong Chen
Derui Zhu
Yuxia Wang
Congbo Ma
Chenyang Lyu
Fakhri Karray
218
2
0
30 May 2025
Whose Name Comes Up? Auditing LLM-Based Scholar Recommendations
Whose Name Comes Up? Auditing LLM-Based Scholar Recommendations
Daniele Barolo
Chiara Valentin
Fariba Karimi
Luis Galárraga
Gonzalo G. Méndez
Lisette Espín-Noboa
228
0
0
29 May 2025
How Does Response Length Affect Long-Form Factuality
How Does Response Length Affect Long-Form FactualityAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
James Xu Zhao
Jimmy Z.J. Liu
Bryan Hooi
See-Kiong Ng
HILMKELM
208
3
0
29 May 2025
Data-efficient Meta-models for Evaluation of Context-based Questions and Answers in LLMs
Data-efficient Meta-models for Evaluation of Context-based Questions and Answers in LLMsAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
Julia Belikova
Konstantin Polev
Rauf Parchiev
Dmitry Simakov
152
0
0
29 May 2025
Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs
Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs
Hexiang Tan
Fei Sun
Sha Liu
Du Su
Qi Cao
...
Jingang Wang
Xunliang Cai
Yuanzhuo Wang
Huawei Shen
Xueqi Cheng
HILM
460
1
0
23 May 2025
When Do LLMs Admit Their Mistakes? Understanding the Role of Model Belief in Retraction
When Do LLMs Admit Their Mistakes? Understanding the Role of Model Belief in Retraction
Yuqing Yang
Robin Jia
KELMLRM
312
2
0
22 May 2025
RePPL: Recalibrating Perplexity by Uncertainty in Semantic Propagation and Language Generation for Explainable QA Hallucination Detection
RePPL: Recalibrating Perplexity by Uncertainty in Semantic Propagation and Language Generation for Explainable QA Hallucination Detection
Yiming Huang
Junyan Zhang
Zihao Wang
Biquan Bie
Xuming Hu
Yi R.
Fung
291
0
0
21 May 2025
Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs
Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs
Hao Wang
Pinzhi Huang
Jihan Yang
Saining Xie
Daisuke Kawahara
427
1
0
21 May 2025
HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving
HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving
Zhiwen Chen
Bo Leng
Zhuoren Li
Hanming Deng
Guizhe Jin
Ran Yu
Huanxi Wen
525
2
0
21 May 2025
Void in Language Models
Void in Language Models
Mani Shemiranifar
208
1
0
20 May 2025
Truth Neurons
Truth Neurons
Haohang Li
Yun Feng
Yangyang Yu
Jordan W. Suchow
Zining Zhu
HILMMILMKELM
392
0
0
18 May 2025
Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors
Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors
Jing Huang
Junyi Tao
Thomas Icard
Diyi Yang
Christopher Potts
OODD
390
3
0
17 May 2025
Revealing economic facts: LLMs know more than they say
Revealing economic facts: LLMs know more than they say
Marcus Buckmann
Quynh Anh Nguyen
Edward Hill
262
3
0
13 May 2025
Investigating task-specific prompts and sparse autoencoders for activation monitoring
Investigating task-specific prompts and sparse autoencoders for activation monitoring
Henk Tillman
Dan Mossing
LLMSV
275
10
0
28 Apr 2025
The Geometry of Self-Verification in a Task-Specific Reasoning Model
The Geometry of Self-Verification in a Task-Specific Reasoning Model
Andrew Lee
Lihao Sun
Chris Wendler
Fernanda Viégas
Martin Wattenberg
LRM
395
3
0
19 Apr 2025
123
Next