Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.19501
Cited By
Monitoring Latent World States in Language Models with Propositional Probes
27 June 2024
Jiahai Feng
Stuart Russell
Jacob Steinhardt
HILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Monitoring Latent World States in Language Models with Propositional Probes"
13 / 13 papers shown
Title
Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models
Guy Kaplan
Michael Toker
Yuval Reif
Yonatan Belinkov
Roy Schwartz
DiffM
48
0
0
01 Apr 2025
Mind the Gap: Bridging the Divide Between AI Aspirations and the Reality of Autonomous Characterization
Grace Guinan
Addison Salvador
Michelle A. Smeaton
Andrew Glaws
Hilary Egan
Brian C. Wyatt
Babak Anasori
K. Fiedler
M. Olszta
Steven Spurgeon
63
0
0
25 Feb 2025
Mechanistic Interpretability of Emotion Inference in Large Language Models
Ala Nekouvaght Tak
Amin Banayeeanzade
Anahita Bolourani
Mina Kian
Robin Jia
Jonathan Gratch
49
0
0
08 Feb 2025
Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages
Jannik Brinkmann
Chris Wendler
Christian Bartelt
Aaron Mueller
41
9
0
10 Jan 2025
How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis
Guan Zhe Hong
Nishanth Dikkala
Enming Luo
Cyrus Rashtchian
Xin Wang
Rina Panigrahy
OffRL
LRM
NAI
29
0
0
06 Nov 2024
Relational Composition in Neural Networks: A Survey and Call to Action
Martin Wattenberg
Fernanda Viégas
CoGe
36
9
0
19 Jul 2024
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
D. Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
209
178
0
20 Oct 2023
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
Samuel Marks
Max Tegmark
HILM
91
164
0
10 Oct 2023
The System Model and the User Model: Exploring AI Dashboard Design
Fernanda Viégas
Martin Wattenberg
21
6
0
04 May 2023
Entity Tracking in Language Models
Najoung Kim
Sebastian Schuster
50
16
0
03 May 2023
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Atticus Geiger
Zhengxuan Wu
Christopher Potts
Thomas F. Icard
Noah D. Goodman
CML
73
98
0
05 Mar 2023
Leveraging Large Language Models for Multiple Choice Question Answering
Joshua Robinson
Christopher Rytting
David Wingate
ELM
126
181
0
22 Oct 2022
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
210
364
0
15 Oct 2021
1