Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.19384
Cited By
The Remarkable Robustness of LLMs: Stages of Inference?
27 June 2024
Vedang Lad
Wes Gurnee
Max Tegmark
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Remarkable Robustness of LLMs: Stages of Inference?"
16 / 16 papers shown
Title
Decoding Vision Transformers: the Diffusion Steering Lens
Ryota Takatsuki
Sonia Joseph
Ippei Fujisawa
Ryota Kanai
DiffM
30
0
0
18 Apr 2025
Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages
Jannik Brinkmann
Chris Wendler
Christian Bartelt
Aaron Mueller
41
9
0
10 Jan 2025
Decomposing The Dark Matter of Sparse Autoencoders
Joshua Engels
Logan Riggs
Max Tegmark
LLMSV
50
9
0
18 Oct 2024
From Tokens to Words: On the Inner Lexicon of LLMs
Guy Kaplan
Matanel Oren
Yuval Reif
Roy Schwartz
39
12
0
08 Oct 2024
Racing Thoughts: Explaining Contextualization Errors in Large Language Models
Michael A. Lepori
Michael Mozer
Asma Ghandeharioun
LRM
80
1
0
02 Oct 2024
Programming Refusal with Conditional Activation Steering
Bruce W. Lee
Inkit Padhi
K. Ramamurthy
Erik Miehling
Pierre L. Dognin
Manish Nagireddy
Amit Dhurandhar
LLMSV
89
13
0
06 Sep 2024
Transformer Layers as Painters
Qi Sun
Marc Pickett
Aakash Kumar Nain
Llion Jones
AI4CE
29
13
0
12 Jul 2024
Information Flow Routes: Automatically Interpreting Language Models at Scale
Javier Ferrando
Elena Voita
40
10
0
27 Feb 2024
Do Llamas Work in English? On the Latent Language of Multilingual Transformers
Chris Wendler
V. Veselovsky
Giovanni Monea
Robert West
56
95
0
16 Feb 2024
Universal Neurons in GPT2 Language Models
Wes Gurnee
Theo Horsley
Zifan Carl Guo
Tara Rezaei Kheirkhah
Qinyi Sun
Will Hathaway
Neel Nanda
Dimitris Bertsimas
MILM
86
37
0
22 Jan 2024
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
Samuel Marks
Max Tegmark
HILM
91
164
0
10 Oct 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
153
186
0
02 May 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
189
260
0
28 Apr 2023
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
240
453
0
24 Sep 2022
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
221
291
0
24 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
242
1,977
0
31 Dec 2020
1