Cognitive Dissonance: Why Do Language Model Outputs Disagree with
Internal Representations of Truthfulness?

Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?

27 November 2023

Dylan Hadfield-Menell

Papers citing "Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?"

10 / 10 papers shown

Title
What do Language Model Probabilities Represent? From Distribution Estimation to Response Prediction Eitan Wagner Omri Abend 24 0 0 04 May 2025
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation Dongryeol Lee Yerin Hwang Yongil Kim Joonsuk Park Kyomin Jung ELM 62 4 0 28 Oct 2024
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation Yiming Wang Pei Zhang Baosong Yang Derek F. Wong Rui-cang Wang LRM 36 4 0 17 Oct 2024
Exploring Multilingual Probing in Large Language Models: A Cross-Language Analysis Daoyang Li Mingyu Jin Qingcheng Zeng Mengnan Du 48 2 0 22 Sep 2024
Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell Taiming Lu Muhan Gao Kuai Yu Adam Byerly Daniel Khashabi 29 11 0 20 Jun 2024
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward Xuan Xie Jiayang Song Zhehua Zhou Yuheng Huang Da Song Lei Ma OffRL 35 6 0 12 Apr 2024
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets Samuel Marks Max Tegmark HILM 91 164 0 10 Oct 2023
Truthful AI: Developing and governing AI that does not lie Owain Evans Owen Cotton-Barratt Lukas Finnveden Adam Bales Avital Balwit Peter Wills Luca Righetti William Saunders HILM 212 107 0 13 Oct 2021
Probing Classifiers: Promises, Shortcomings, and Advances Yonatan Belinkov 221 291 0 24 Feb 2021
Language Models as Knowledge Bases? Fabio Petroni Tim Rocktaschel Patrick Lewis A. Bakhtin Yuxiang Wu Alexander H. Miller Sebastian Riedel KELM AI4MH 393 2,216 0 03 Sep 2019