Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.12918
Cited By
N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models
22 April 2023
Alex Foote
Neel Nanda
Esben Kran
Ionnis Konstas
Fazl Barez
MILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models"
6 / 6 papers shown
Title
Self-Ablating Transformers: More Interpretability, Less Sparsity
Jeremias Ferrao
Luhan Mikaelson
Keenan Pepper
Natalia Perez-Campanero Antolin
MILM
21
0
0
01 May 2025
Explaining black box text modules in natural language with language models
Chandan Singh
Aliyah R. Hsu
Richard Antonello
Shailee Jain
Alexander G. Huth
Bin-Xia Yu
Jianfeng Gao
MILM
26
46
0
17 May 2023
Toy Models of Superposition
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
...
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
AAML
MILM
122
317
0
21 Sep 2022
Unsolved Problems in ML Safety
Dan Hendrycks
Nicholas Carlini
John Schulman
Jacob Steinhardt
186
273
0
28 Sep 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
253
1,989
0
31 Dec 2020
Similarity Analysis of Contextual Word Representation Models
John M. Wu
Yonatan Belinkov
Hassan Sajjad
Nadir Durrani
Fahim Dalvi
James R. Glass
46
73
0
03 May 2020
1