Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.05147
Cited By
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
9 August 2024
Tom Lieberum
Senthooran Rajamanoharan
Arthur Conmy
Lewis Smith
Nicolas Sonnerat
Vikrant Varma
János Kramár
Anca Dragan
Rohin Shah
Neel Nanda
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2"
16 / 66 papers shown
Title
Mechanistic Permutability: Match Features Across Layers
Nikita Balagansky
Ian Maksimov
Daniil Gavrilov
13
4
0
10 Oct 2024
The Geometry of Concepts: Sparse Autoencoder Feature Structure
Yuxiao Li
Eric J. Michaud
David D. Baek
Joshua Engels
Xiaoqing Sun
Max Tegmark
50
7
0
10 Oct 2024
SAGE: Scalable Ground Truth Evaluations for Large Sparse Autoencoders
Constantin Venhoff
Anisoara Calinescu
Philip H. S. Torr
Christian Schroeder de Witt
28
0
0
09 Oct 2024
Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models
Michael Lan
Philip H. S. Torr
Austin Meek
Ashkan Khakzar
David M. Krueger
Fazl Barez
28
10
0
09 Oct 2024
Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models
Can Demircan
Tankred Saanum
A. Jagadish
Marcel Binz
Eric Schulz
25
1
0
02 Oct 2024
Towards Inference-time Category-wise Safety Steering for Large Language Models
Amrita Bhattacharjee
Shaona Ghosh
Traian Rebedea
Christopher Parisien
LLMSV
29
3
0
02 Oct 2024
A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders
David Chanin
James Wilken-Smith
Tomáš Dulka
Hardik Bhatnagar
Joseph Bloom
15
17
0
22 Sep 2024
Residual Stream Analysis with Multi-Layer SAEs
Tim Lawson
Lucy Farnik
Conor Houghton
Laurence Aitchison
26
3
0
06 Sep 2024
Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small
Maheep Chaudhary
Atticus Geiger
18
13
0
05 Sep 2024
Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
Adam Karvonen
Benjamin Wright
Can Rager
Rico Angell
Jannik Brinkmann
Logan Smith
C. M. Verdun
David Bau
Samuel Marks
38
26
0
31 Jul 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
75
19
0
02 Jul 2024
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
210
494
0
01 Nov 2022
Toy Models of Superposition
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
...
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
AAML
MILM
120
316
0
21 Sep 2022
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
224
404
0
24 Feb 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,817
0
17 Sep 2019
Efficient Estimation of Word Representations in Vector Space
Tomáš Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
228
31,244
0
16 Jan 2013
Previous
1
2