Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.12664
Cited By
Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience
22 August 2024
Zhonghao He
Jascha Achterberg
Katie Collins
Kevin K. Nejad
Danyal Akarca
Yinzhu Yang
Wes Gurnee
Ilia Sucholutsky
Yuhan Tang
Rebeca Ianov
George Ogden
Chole Li
Kai J. Sandbrink
Stephen Casper
Anna Ivanova
Grace W. Lindsay
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience"
9 / 9 papers shown
Title
Towards Uncovering How Large Language Model Works: An Explainability Perspective
Haiyan Zhao
Fan Yang
Bo Shen
Himabindu Lakkaraju
Mengnan Du
30
10
0
16 Feb 2024
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
207
486
0
01 Nov 2022
Omnigrok: Grokking Beyond Algorithmic Data
Ziming Liu
Eric J. Michaud
Max Tegmark
54
76
0
03 Oct 2022
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
237
453
0
24 Sep 2022
Toy Models of Superposition
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
...
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
AAML
MILM
117
314
0
21 Sep 2022
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
221
291
0
24 Feb 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
Towards A Rigorous Science of Interpretable Machine Learning
Finale Doshi-Velez
Been Kim
XAI
FaML
219
3,658
0
28 Feb 2017
Demixed principal component analysis of population activity in higher cortical areas reveals independent representation of task parameters
D. Kobak
Wieland Brendel
C. Constantinidis
C. Feierstein
Adam Kepecs
Z. Mainen
R. Romo
Xue-Lian Qi
N. Uchida
C. Machens
27
462
0
22 Oct 2014
1