Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.02207
Cited By
Language Models Represent Space and Time
3 October 2023
Wes Gurnee
Max Tegmark
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Language Models Represent Space and Time"
44 / 44 papers shown
Title
Revealing economic facts: LLMs know more than they say
Marcus Buckmann
Quynh Anh Nguyen
Edward Hill
21
0
0
13 May 2025
Geospatial Mechanistic Interpretability of Large Language Models
Stef De Sabbata
Stefano Mizzaro
Kevin Roitero
AI4CE
28
0
0
06 May 2025
Do Large Language Models know who did what to whom?
Joseph M. Denning
Xiaohan
Bryor Snefjella
Idan A. Blank
50
1
0
23 Apr 2025
Chain-of-Thought Reasoning In The Wild Is Not Always Faithful
Iván Arcuschin
Jett Janiak
Robert Krzyzanowski
Senthooran Rajamanoharan
Neel Nanda
Arthur Conmy
LRM
ReLM
62
6
0
11 Mar 2025
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
Richard Ren
Arunim Agarwal
Mantas Mazeika
Cristina Menghini
Robert Vacareanu
...
Matias Geralnik
Adam Khoja
Dean Lee
Summer Yue
Dan Hendrycks
HILM
ALM
88
0
0
05 Mar 2025
Towards Understanding Distilled Reasoning Models: A Representational Approach
David D. Baek
Max Tegmark
LRM
75
2
0
05 Mar 2025
Linear Representations of Political Perspective Emerge in Large Language Models
Junsol Kim
James Evans
Aaron Schein
75
2
0
03 Mar 2025
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer
Marthe Ballon
Andres Algaba
Vincent Ginis
LRM
ReLM
36
4
0
24 Feb 2025
Lines of Thought in Large Language Models
Raphael Sarfati
Toni J. B. Liu
Nicolas Boullé
Christopher Earls
LRM
VLM
LM&Ro
58
1
0
17 Feb 2025
The Representation and Recall of Interwoven Structured Knowledge in LLMs: A Geometric and Layered Analysis
Ge Lei
Samuel J. Cooper
KELM
47
0
0
15 Feb 2025
Human-like conceptual representations emerge from language prediction
Ningyu Xu
Qi Zhang
Chao Du
Qiang Luo
Xipeng Qiu
Xuanjing Huang
Menghan Zhang
70
0
0
21 Jan 2025
Revisiting Rogers' Paradox in the Context of Human-AI Interaction
K. M. Collins
Umang Bhatt
Ilia Sucholutsky
42
1
0
16 Jan 2025
ICLR: In-Context Learning of Representations
Core Francisco Park
Andrew Lee
Ekdeep Singh Lubana
Yongyi Yang
Maya Okawa
Kento Nishi
Martin Wattenberg
Hidenori Tanaka
AIFin
114
3
0
29 Dec 2024
JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit
Zeqing He
Zhibo Wang
Zhixuan Chu
Huiyu Xu
Rui Zheng
Kui Ren
Chun Chen
52
3
0
17 Nov 2024
All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling
Emanuele Marconato
Sébastien Lachapelle
Sebastian Weichwald
Luigi Gresele
64
3
0
30 Oct 2024
Improving Instruction-Following in Language Models through Activation Steering
Alessandro Stolfo
Vidhisha Balachandran
Safoora Yousefi
Eric Horvitz
Besmira Nushi
LLMSV
52
14
0
15 Oct 2024
The Geometry of Concepts: Sparse Autoencoder Feature Structure
Yuxiao Li
Eric J. Michaud
David D. Baek
Joshua Engels
Xiaoqing Sun
Max Tegmark
50
7
0
10 Oct 2024
Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax
I. Butakov
Alexander Sememenko
Alexander Tolmachev
Andrey Gladkov
Marina Munkhoeva
Alexey Frolov
32
0
0
09 Oct 2024
Programming Refusal with Conditional Activation Steering
Bruce W. Lee
Inkit Padhi
K. Ramamurthy
Erik Miehling
Pierre L. Dognin
Manish Nagireddy
Amit Dhurandhar
LLMSV
91
13
0
06 Sep 2024
Understanding Generative AI Content with Embedding Models
Max Vargas
Reilly Cannon
A. Engel
Anand D. Sarwate
Tony Chiang
47
3
0
19 Aug 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
75
19
0
02 Jul 2024
Monitoring Latent World States in Language Models with Propositional Probes
Jiahai Feng
Stuart Russell
Jacob Steinhardt
HILM
32
6
0
27 Jun 2024
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij
Felix Hofstätter
Ollie Jaffe
Samuel F. Brown
Francis Rhys Ward
ELM
37
23
0
11 Jun 2024
Feature contamination: Neural networks learn uncorrelated features and fail to generalize
Tianren Zhang
Chujie Zhao
Guanyu Chen
Yizhou Jiang
Feng Chen
OOD
MLT
OODD
69
3
0
05 Jun 2024
The Geometry of Categorical and Hierarchical Concepts in Large Language Models
Kiho Park
Yo Joong Choe
Yibo Jiang
Victor Veitch
50
25
0
03 Jun 2024
Survival of the Fittest Representation: A Case Study with Modular Addition
Xiaoman Delores Ding
Zifan Carl Guo
Eric J. Michaud
Ziming Liu
Max Tegmark
34
3
0
27 May 2024
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability
Shenyuan Gao
Jiazhi Yang
Li Chen
Kashyap Chitta
Yihang Qiu
Andreas Geiger
Jun Zhang
Hongyang Li
60
75
0
27 May 2024
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
Bernal Jiménez Gutiérrez
Yiheng Shu
Yu Gu
Michihiro Yasunaga
Yu-Chuan Su
RALM
CLL
60
33
0
23 May 2024
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov
Kushal Tirumala
Hassan Shapourian
Paolo Glorioso
Daniel A. Roberts
41
79
0
26 Mar 2024
Language Models Represent Beliefs of Self and Others
Wentao Zhu
Zhining Zhang
Yizhou Wang
MILM
LRM
38
7
0
28 Feb 2024
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Daubener
...
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
Vincent Fortuin
56
17
0
28 Feb 2024
Opening the AI black box: program synthesis via mechanistic interpretability
Eric J. Michaud
Isaac Liao
Vedang Lad
Ziming Liu
Anish Mudide
Chloe Loughridge
Zifan Carl Guo
Tara Rezaei Kheirkhah
Mateja Vukelić
Max Tegmark
23
12
0
07 Feb 2024
Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models
Weijiao Zhang
Jindong Han
Zhao Xu
Hang Ni
Hao Liu
Hui Xiong
Hui Xiong
AI4CE
77
15
0
30 Jan 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
13
76
0
25 Jan 2024
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
Simon Lermen
Charlie Rogers-Smith
Jeffrey Ladish
ALM
15
80
0
31 Oct 2023
On Generative Agents in Recommendation
An Zhang
Yuxin Chen
Leheng Sheng
Xiang Wang
Tat-Seng Chua
30
43
0
16 Oct 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
153
186
0
02 May 2023
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
Michael Hanna
Ollie Liu
Alexandre Variengien
LRM
189
119
0
30 Apr 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
189
261
0
28 Apr 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
251
2,232
0
22 Mar 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
210
494
0
01 Nov 2022
Toy Models of Superposition
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
...
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
AAML
MILM
120
316
0
21 Sep 2022
Do Language Models Know the Way to Rome?
Bastien Liétard
Mostafa Abdou
Anders Søgaard
46
15
0
16 Sep 2021
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
224
404
0
24 Feb 2021
1