Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.13382
Cited By
Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task
24 October 2022
Kenneth Li
Aspen K. Hopkins
David Bau
Fernanda Viégas
Hanspeter Pfister
Martin Wattenberg
MILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task"
50 / 200 papers shown
Title
Do Music Generation Models Encode Music Theory?
Megan Wei
Michael Freeman
Chris Donahue
Chen Sun
MGen
28
4
0
01 Oct 2024
Exploring the Learning Capabilities of Language Models using LEVERWORLDS
Eitan Wagner
Amir Feder
Omri Abend
21
0
0
01 Oct 2024
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
Haiyan Zhao
Heng Zhao
Bo Shen
Ali Payani
Fan Yang
Mengnan Du
65
4
0
30 Sep 2024
Counterfactual Token Generation in Large Language Models
Ivi Chatzi
N. C. Benz
Eleni Straitouri
Stratis Tsirtsis
Manuel Gomez Rodriguez
LRM
44
3
0
25 Sep 2024
A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders
David Chanin
James Wilken-Smith
Tomáš Dulka
Hardik Bhatnagar
Joseph Bloom
23
21
0
22 Sep 2024
Causal Language Modeling Can Elicit Search and Reasoning Capabilities on Logic Puzzles
Kulin Shah
Nishanth Dikkala
Xin Wang
Rina Panigrahy
ELM
ReLM
LRM
42
12
0
16 Sep 2024
Householder Pseudo-Rotation: A Novel Approach to Activation Editing in LLMs with Direction-Magnitude Perspective
Van-Cuong Pham
Thien Huu Nguyen
LLMSV
43
3
0
16 Sep 2024
Optimal ablation for interpretability
Maximilian Li
Lucas Janson
FAtt
54
2
0
16 Sep 2024
Prevailing Research Areas for Music AI in the Era of Foundation Models
Megan Wei
M. Modrzejewski
Aswin Sivaraman
Dorien Herremans
MedIm
45
1
0
14 Sep 2024
Representational Analysis of Binding in Language Models
Qin Dai
Benjamin Heinzerling
Kentaro Inui
34
2
0
09 Sep 2024
Can Transformers Do Enumerative Geometry?
Baran Hashemi
Roderic G. Corominas
Alessandro Giacchetto
49
2
0
27 Aug 2024
KAN 2.0: Kolmogorov-Arnold Networks Meet Science
Ziming Liu
Pingchuan Ma
Yixuan Wang
Wojciech Matusik
Max Tegmark
48
62
0
19 Aug 2024
Understanding Generative AI Content with Embedding Models
Max Vargas
Reilly Cannon
A. Engel
Anand D. Sarwate
Tony Chiang
65
3
0
19 Aug 2024
The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability
Aaron Mueller
Jannik Brinkmann
Millicent Li
Samuel Marks
Koyena Pal
...
Arnab Sen Sharma
Jiuding Sun
Eric Todd
David Bau
Yonatan Belinkov
CML
55
19
0
02 Aug 2024
Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
Adam Karvonen
Benjamin Wright
Can Rager
Rico Angell
Jannik Brinkmann
Logan Smith
C. M. Verdun
David Bau
Samuel Marks
38
27
0
31 Jul 2024
Cluster-norm for Unsupervised Probing of Knowledge
Walter Laurito
Sharan Maiya
Grégoire Dhimoïla
Owen
Owen Yeung
Kaarel Hänni
31
2
0
26 Jul 2024
Probabilistic Parameter Estimators and Calibration Metrics for Pose Estimation from Image Features
Romeo Valentin
Sydney M. Katz
Joonghyun Lee
Don Walker
Matthew Sorgenfrei
Mykel J. Kochenderfer
41
0
0
23 Jul 2024
Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data
Charles Jin
Martin Rinard
32
1
0
18 Jul 2024
Mechanistically Interpreting a Transformer-based 2-SAT Solver: An Axiomatic Approach
Nils Palumbo
Ravi Mangal
Zifan Wang
Saranya Vijayakumar
Corina S. Pasareanu
Somesh Jha
44
1
0
18 Jul 2024
Analyzing the Generalization and Reliability of Steering Vectors
Daniel Tan
David Chanin
Aengus Lynch
Dimitrios Kanoulas
Brooks Paige
Adrià Garriga-Alonso
Robert Kirk
LLMSV
89
17
0
17 Jul 2024
States Hidden in Hidden States: LLMs Emerge Discrete State Representations Implicitly
Junhao Chen
Shengding Hu
Zhiyuan Liu
Maosong Sun
LRM
34
5
0
16 Jul 2024
Transforming Agency. On the mode of existence of Large Language Models
Xabier E. Barandiaran
Lola S. Almendros
LLMAG
LM&Ro
48
4
0
15 Jul 2024
Compositional Structures in Neural Embedding and Interaction Decompositions
Matthew Trager
Alessandro Achille
Pramuditha Perera
L. Zancato
Stefano Soatto
CoGe
42
0
0
12 Jul 2024
Transformer Circuit Faithfulness Metrics are not Robust
Joseph Miller
Bilal Chughtai
William Saunders
58
7
0
11 Jul 2024
A Text-to-Game Engine for UGC-Based Role-Playing Games
Lei Zhang
Xuezheng Peng
Shuyi Yang
Feiyang Wang
37
1
0
11 Jul 2024
Identifying the Source of Generation for Large Language Models
Bumjin Park
Jaesik Choi
42
0
0
05 Jul 2024
Over the Edge of Chaos? Excess Complexity as a Roadblock to Artificial General Intelligence
Teo Susnjak
Timothy R. McIntosh
A. Barczak
N. Reyes
Tong Liu
Paul Watters
Malka N. Halgamuge
34
3
0
04 Jul 2024
Monitoring Latent World States in Language Models with Propositional Probes
Jiahai Feng
Stuart Russell
Jacob Steinhardt
HILM
48
8
0
27 Jun 2024
SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation
Zijun Yao
Weijian Qi
Liangming Pan
S. Cao
Linmei Hu
Weichuan Liu
Lei Hou
Juanzi Li
RALM
56
6
0
27 Jun 2024
Does ChatGPT Have a Mind?
Simon Goldstein
B. Levinstein
AI4MH
LRM
44
5
0
27 Jun 2024
Brittle Minds, Fixable Activations: Understanding Belief Representations in Language Models
Matteo Bortoletto
Constantin Ruhdorfer
Lei Shi
Andreas Bulling
AI4MH
LRM
50
4
0
25 Jun 2024
Towards a Science Exocortex
Kevin G. Yager
80
0
0
24 Jun 2024
CLEAR: Can Language Models Really Understand Causal Graphs?
Sirui Chen
Mengying Xu
Kun Wang
Xingyu Zeng
Rui Zhao
Shengjie Zhao
Chaochao Lu
LRM
ELM
42
8
0
24 Jun 2024
Unlocking the Future: Exploring Look-Ahead Planning Mechanistic Interpretability in Large Language Models
Tianyi Men
Pengfei Cao
Zhuoran Jin
Yubo Chen
Kang Liu
Jun Zhao
LLMAG
AIFin
40
6
0
23 Jun 2024
Beyond the Doors of Perception: Vision Transformers Represent Relations Between Objects
Michael A. Lepori
Alexa R. Tartaglini
Wai Keen Vong
Thomas Serre
Brenden M. Lake
Ellie Pavlick
44
2
0
22 Jun 2024
A Notion of Complexity for Theory of Mind via Discrete World Models
X. A. Huang
Emanuele La Malfa
Samuele Marro
Andrea Asperti
Anthony Cohn
Michael Wooldridge
47
6
0
16 Jun 2024
PaCE: Parsimonious Concept Engineering for Large Language Models
Jinqi Luo
Tianjiao Ding
Kwan Ho Ryan Chan
D. Thaker
Aditya Chattopadhyay
Chris Callison-Burch
René Vidal
CVBM
44
7
0
06 Jun 2024
Evaluating the World Model Implicit in a Generative Model
Keyon Vafa
Justin Y. Chen
Jon M. Kleinberg
S. Mullainathan
Ashesh Rambachan
90
30
0
06 Jun 2024
Discovering Bias in Latent Space: An Unsupervised Debiasing Approach
Dyah Adila
Shuai Zhang
Boran Han
Yuyang Wang
AAML
LLMSV
36
6
0
05 Jun 2024
The Geometry of Categorical and Hierarchical Concepts in Large Language Models
Kiho Park
Yo Joong Choe
Yibo Jiang
Victor Veitch
55
28
0
03 Jun 2024
Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Erik Jenner
Shreyas Kapur
Vasil Georgiev
Cameron Allen
Scott Emmons
Stuart J. Russell
40
11
0
02 Jun 2024
Standards for Belief Representations in LLMs
Daniel A. Herrmann
B. Levinstein
49
9
0
31 May 2024
InversionView: A General-Purpose Method for Reading Information from Neural Activations
Xinting Huang
Madhur Panwar
Navin Goyal
Michael Hahn
39
4
0
27 May 2024
From Neurons to Neutrons: A Case Study in Interpretability
O. Kitouni
Niklas Nolte
Víctor Samuel Pérez-Díaz
S. Trifinopoulos
Mike Williams
MILM
27
1
0
27 May 2024
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability
Shenyuan Gao
Jiazhi Yang
Li Chen
Kashyap Chitta
Yihang Qiu
Andreas Geiger
Jun Zhang
Hongyang Li
71
75
0
27 May 2024
Exploring the LLM Journey from Cognition to Expression with Linear Representations
Yuzi Yan
J. Li
Yipin Zhang
Dong Yan
49
1
0
27 May 2024
Transformers represent belief state geometry in their residual stream
A. Shai
Sarah E. Marzen
Lucas Teixeira
Alexander Gietelink Oldenziel
P. Riechers
AI4CE
37
13
0
24 May 2024
What is it for a Machine Learning Model to Have a Capability?
Jacqueline Harding
Nathaniel Sharadin
ELM
40
3
0
14 May 2024
A Philosophical Introduction to Language Models - Part II: The Way Forward
Raphael Milliere
Cameron Buckner
LRM
66
14
0
06 May 2024
Mechanistic Interpretability for AI Safety -- A Review
Leonard Bereska
E. Gavves
AI4CE
50
118
0
22 Apr 2024
Previous
1
2
3
4
Next