Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.12312
Cited By
Interpreting Neural Networks through the Polytope Lens
22 November 2022
Sid Black
Lee D. Sharkey
Léo Grinsztajn
Eric Winsor
Daniel A. Braun
Jacob Merizian
Kip Parker
Carlos Ramón Guevara
Beren Millidge
Gabriel Alfour
Connor Leahy
FAtt
MILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Interpreting Neural Networks through the Polytope Lens"
20 / 20 papers shown
Title
On Linear Representations and Pretraining Data Frequency in Language Models
Jack Merullo
Noah A. Smith
Sarah Wiegreffe
Yanai Elazar
35
0
0
16 Apr 2025
Enhancing Neural Network Interpretability with Feature-Aligned Sparse Autoencoders
Luke Marks
Alasdair Paren
David M. Krueger
Fazl Barez
AAML
27
4
0
02 Nov 2024
SAGE: Scalable Ground Truth Evaluations for Large Sparse Autoencoders
Constantin Venhoff
Anisoara Calinescu
Philip H. S. Torr
Christian Schroeder de Witt
33
0
0
09 Oct 2024
Characterizing stable regions in the residual stream of LLMs
Jett Janiak
Jacek Karwowski
Chatrik Singh Mangat
Giorgi Giglemiani
Nora Petrova
Stefan Heimersheim
44
1
0
25 Sep 2024
TracrBench: Generating Interpretability Testbeds with Large Language Models
Hannes Thurnherr
Jérémy Scheurer
46
3
0
07 Sep 2024
The Mechanics of Conceptual Interpretation in GPT Models: Interpretative Insights
Nura Aljaafari
Danilo S. Carvalho
André Freitas
KELM
32
0
0
05 Aug 2024
Analyzing the Generalization and Reliability of Steering Vectors
Daniel Tan
David Chanin
Aengus Lynch
Dimitrios Kanoulas
Brooks Paige
Adrià Garriga-Alonso
Robert Kirk
LLMSV
84
16
0
17 Jul 2024
Functional Faithfulness in the Wild: Circuit Discovery with Differentiable Computation Graph Pruning
Lei Yu
Jingcheng Niu
Zining Zhu
Gerald Penn
36
5
0
04 Jul 2024
Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs
Weixuan Wang
Barry Haddow
Wei Peng
Alexandra Birch
MILM
35
9
0
13 Jun 2024
Weight-based Decomposition: A Case for Bilinear MLPs
Michael T. Pearce
Thomas Dooms
Alice Rigg
42
1
0
06 Jun 2024
Mechanistic Interpretability for AI Safety -- A Review
Leonard Bereska
E. Gavves
AI4CE
40
111
0
22 Apr 2024
A singular Riemannian Geometry Approach to Deep Neural Networks III. Piecewise Differentiable Layers and Random Walks on
n
n
n
-dimensional Classes
A. Benfenati
A. Marta
24
1
0
09 Apr 2024
Defining Neural Network Architecture through Polytope Structures of Dataset
Sangmin Lee
Abbas Mammadov
Jong Chul Ye
56
0
0
04 Feb 2024
Explainable Artificial Intelligence (XAI) 2.0: A Manifesto of Open Challenges and Interdisciplinary Research Directions
Luca Longo
Mario Brcic
Federico Cabitza
Jaesik Choi
Roberto Confalonieri
...
Andrés Páez
Wojciech Samek
Johannes Schneider
Timo Speith
Simone Stumpf
29
189
0
30 Oct 2023
Neural Polytopes
Koji Hashimoto
T. Naito
Hisashi Naito
22
1
0
03 Jul 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
155
186
0
02 May 2023
Disentangling Neuron Representations with Concept Vectors
Laura O'Mahony
Vincent Andrearczyk
Henning Muller
Mara Graziani
MILM
25
14
0
19 Apr 2023
Break It Down: Evidence for Structural Compositionality in Neural Networks
Michael A. Lepori
Thomas Serre
Ellie Pavlick
33
29
0
26 Jan 2023
Toy Models of Superposition
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
...
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
AAML
MILM
120
317
0
21 Sep 2022
Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks
Tilman Raukur
A. Ho
Stephen Casper
Dylan Hadfield-Menell
AAML
AI4CE
18
124
0
27 Jul 2022
1