Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2410.16484
Cited By
v1
v2 (latest)
Identifying Sub-networks in Neural Networks via Functionally Similar Representations
21 October 2024
Tian Gao
Amit Dhurandhar
Karthikeyan N. Ramamurthy
Dennis L. Wei
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Identifying Sub-networks in Neural Networks via Functionally Similar Representations"
40 / 40 papers shown
Title
Hypothesis Testing the Circuit Hypothesis in LLMs
Neural Information Processing Systems (NeurIPS), 2024
Claudia Shi
Nicolas Beltran-Velez
Achille Nazaret
Carolina Zheng
Adrià Garriga-Alonso
Andrew Jesson
Maggie Makar
David M. Blei
209
17
0
16 Oct 2024
The Geometry of Categorical and Hierarchical Concepts in Large Language Models
Kiho Park
Yo Joong Choe
Yibo Jiang
Victor Veitch
430
62
0
03 Jun 2024
InversionView: A General-Purpose Method for Reading Information from Neural Activations
Xinting Huang
Madhur Panwar
Navin Goyal
Michael Hahn
283
9
0
27 May 2024
PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits
Maximilian Dreyer
Erblina Purelku
Johanna Vielhaben
Wojciech Samek
Sebastian Lapuschkin
MILM
119
21
0
09 Apr 2024
The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models
Adithya Bhaskar
Dan Friedman
Danqi Chen
309
9
0
06 Mar 2024
AtP*: An efficient and scalable method for localizing LLM behaviour to components
János Kramár
Tom Lieberum
Rohin Shah
Neel Nanda
KELM
225
69
0
01 Mar 2024
NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models
Amit Dhurandhar
Tejaswini Pedapati
Ronny Luss
Soham Dan
Aurélie C. Lozano
Payel Das
Georgios Kollias
319
3
0
28 Feb 2024
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
Jing-ling Huang
Zhengxuan Wu
Christopher Potts
Mor Geva
Atticus Geiger
248
53
0
27 Feb 2024
Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals
Francesco Ortu
Zhijing Jin
Diego Doimo
Mrinmaya Sachan
Alberto Cazzaniga
Bernhard Schölkopf
161
30
0
18 Feb 2024
Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Buse Giledereli
Jiaoda Li
Yu Fei
Alessandro Stolfo
Wangchunshu Zhou
Guangtao Zeng
Antoine Bosselut
Mrinmaya Sachan
LRM
337
57
0
23 Oct 2023
Learning the greatest common divisor: explaining transformer predictions
International Conference on Learning Representations (ICLR), 2023
Franccois Charton
225
27
0
29 Aug 2023
Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP
Vedant Palit
Rohan Pandey
Aryaman Arora
Paul Pu Liang
204
43
0
27 Aug 2023
Linearity of Relation Decoding in Transformer Language Models
International Conference on Learning Representations (ICLR), 2023
Evan Hernandez
Arnab Sen Sharma
Tal Haklay
Kevin Meng
Martin Wattenberg
Jacob Andreas
Yonatan Belinkov
David Bau
KELM
283
130
0
17 Aug 2023
Revisiting invariances and introducing priors in Gromov-Wasserstein distances
Pinar Demetci
Quang-Huy Tran
I. Redko
Ritambhara Singh
OT
140
1
0
19 Jul 2023
The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks
Neural Information Processing Systems (NeurIPS), 2023
Ziqian Zhong
Ziming Liu
Max Tegmark
Jacob Andreas
202
132
0
30 Jun 2023
Similarity of Neural Network Models: A Survey of Functional and Representational Measures
ACM Computing Surveys (ACM Comput. Surv.), 2023
Max Klabunde
Tobias Schumacher
M. Strohmaier
Florian Lemmerich
417
102
0
10 May 2023
ZipIt! Merging Models from Different Tasks without Training
International Conference on Learning Representations (ICLR), 2023
George Stoica
Daniel Bolya
J. Bjorner
Pratik Ramesh
Taylor N. Hearn
Judy Hoffman
VLM
MoMe
350
158
0
04 May 2023
Towards Automated Circuit Discovery for Mechanistic Interpretability
Neural Information Processing Systems (NeurIPS), 2023
Arthur Conmy
Augustine N. Mavor-Parker
Aengus Lynch
Stefan Heimersheim
Adrià Garriga-Alonso
413
428
0
28 Apr 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
573
407
0
28 Apr 2023
TRAK: Attributing Model Behavior at Scale
International Conference on Machine Learning (ICML), 2023
Sung Min Park
Kristian Georgiev
Andrew Ilyas
Guillaume Leclerc
Aleksander Madry
TDI
320
223
0
24 Mar 2023
Progress measures for grokking via mechanistic interpretability
International Conference on Learning Representations (ICLR), 2023
Neel Nanda
Lawrence Chan
Tom Lieberum
Jess Smith
Jacob Steinhardt
334
603
0
12 Jan 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
International Conference on Learning Representations (ICLR), 2022
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
508
747
0
01 Nov 2022
On the Origins of the Block Structure Phenomenon in Neural Network Representations
Thao Nguyen
M. Raghu
Simon Kornblith
147
14
0
15 Feb 2022
Locating and Editing Factual Associations in GPT
Neural Information Processing Systems (NeurIPS), 2022
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
831
1,858
0
10 Feb 2022
Differentiable Subset Pruning of Transformer Heads
Transactions of the Association for Computational Linguistics (TACL), 2021
Jiaoda Li
Robert Bamler
Mrinmaya Sachan
270
62
0
10 Aug 2021
Grounding Representation Similarity with Statistical Testing
Frances Ding
Jean-Stanislas Denain
Jacob Steinhardt
176
32
0
03 Aug 2021
Model Compression Using Optimal Transport
Suhas Lohit
Michael J. Jones
181
9
0
07 Dec 2020
Model Fusion via Optimal Transport
Neural Information Processing Systems (NeurIPS), 2019
Sidak Pal Singh
Martin Jaggi
MoMe
FedML
502
282
0
12 Oct 2019
The Shape of Data: Intrinsic Distance for Data Distributions
International Conference on Learning Representations (ICLR), 2019
Anton Tsitsulin
Marina Munkhoeva
Davide Mottin
Panagiotis Karras
A. Bronstein
Ivan Oseledets
Emmanuel Müller
188
57
0
27 May 2019
Similarity of Neural Network Representations Revisited
International Conference on Machine Learning (ICML), 2019
Simon Kornblith
Mohammad Norouzi
Honglak Lee
Geoffrey E. Hinton
1.0K
1,714
0
01 May 2019
Representation Similarity Analysis for Efficient Task taxonomy & Transfer Learning
Kshitij Dwivedi
Gemma Roig
154
163
0
26 Apr 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
2.8K
106,430
0
11 Oct 2018
Insights on representational similarity in neural networks with canonical correlation
Ari S. Morcos
M. Raghu
Samy Bengio
DRL
301
479
0
14 Jun 2018
Optimal Transport for structured data with application on graphs
Titouan Vayer
Laetitia Chapel
Rémi Flamary
R. Tavenard
Nicolas Courty
OT
229
306
0
23 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.6K
7,907
0
20 Apr 2018
Attention Is All You Need
Neural Information Processing Systems (NeurIPS), 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
2.4K
157,232
0
12 Jun 2017
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
3.5K
214,123
0
10 Dec 2015
Convergent Learning: Do different neural networks learn the same representations?
Shouqing Yang
J. Yosinski
Jeff Clune
Hod Lipson
John E. Hopcroft
SSL
257
395
0
24 Nov 2015
Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances
Neural Information Processing Systems (NeurIPS), 2013
Marco Cuturi
OT
652
4,829
0
04 Jun 2013
A Tutorial on Spectral Clustering
Statistics and computing (Stat. Comput.), 2007
U. V. Luxburg
548
10,975
0
01 Nov 2007
1