Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.03729
Cited By
Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?
27 November 2023
Kevin Liu
Stephen Casper
Dylan Hadfield-Menell
Jacob Andreas
HILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?"
10 / 10 papers shown
Title
What do Language Model Probabilities Represent? From Distribution Estimation to Response Prediction
Eitan Wagner
Omri Abend
27
0
0
04 May 2025
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation
Dongryeol Lee
Yerin Hwang
Yongil Kim
Joonsuk Park
Kyomin Jung
ELM
65
4
0
28 Oct 2024
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
Yiming Wang
Pei Zhang
Baosong Yang
Derek F. Wong
Rui-cang Wang
LRM
40
4
0
17 Oct 2024
Exploring Multilingual Probing in Large Language Models: A Cross-Language Analysis
Daoyang Li
Mingyu Jin
Qingcheng Zeng
Mengnan Du
48
2
0
22 Sep 2024
Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell
Taiming Lu
Muhan Gao
Kuai Yu
Adam Byerly
Daniel Khashabi
34
11
0
20 Jun 2024
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward
Xuan Xie
Jiayang Song
Zhehua Zhou
Yuheng Huang
Da Song
Lei Ma
OffRL
35
6
0
12 Apr 2024
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
Samuel Marks
Max Tegmark
HILM
91
164
0
10 Oct 2023
Truthful AI: Developing and governing AI that does not lie
Owain Evans
Owen Cotton-Barratt
Lukas Finnveden
Adam Bales
Avital Balwit
Peter Wills
Luca Righetti
William Saunders
HILM
217
107
0
13 Oct 2021
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
221
291
0
24 Feb 2021
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
393
2,216
0
03 Sep 2019
1