v1v2 (latest)

Identifying Sub-networks in Neural Networks via Functionally Similar Representations

21 October 2024

Tian Gao

Amit Dhurandhar

Karthikeyan N. Ramamurthy

Dennis L. Wei

ArXiv (abs)PDF HTML

Papers citing "Identifying Sub-networks in Neural Networks via Functionally Similar Representations"

40 / 40 papers shown

Title
Hypothesis Testing the Circuit Hypothesis in LLMsNeural Information Processing Systems (NeurIPS), 2024 Claudia Shi Nicolas Beltran-Velez Achille Nazaret Carolina Zheng Adrià Garriga-Alonso Andrew Jesson Maggie Makar David M. Blei 209 17 0 16 Oct 2024
The Geometry of Categorical and Hierarchical Concepts in Large Language Models Kiho Park Yo Joong Choe Yibo Jiang Victor Veitch 430 62 0 03 Jun 2024
InversionView: A General-Purpose Method for Reading Information from Neural Activations Xinting Huang Madhur Panwar Navin Goyal Michael Hahn 283 9 0 27 May 2024
PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits Maximilian Dreyer Erblina Purelku Johanna Vielhaben Wojciech Samek Sebastian Lapuschkin MILM 119 21 0 09 Apr 2024
The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models Adithya Bhaskar Dan Friedman Danqi Chen 309 9 0 06 Mar 2024
AtP*: An efficient and scalable method for localizing LLM behaviour to components János Kramár Tom Lieberum Rohin Shah Neel Nanda KELM 225 69 0 01 Mar 2024
NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models Amit Dhurandhar Tejaswini Pedapati Ronny Luss Soham Dan Aurélie C. Lozano Payel Das Georgios Kollias 319 3 0 28 Feb 2024
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations Jing-ling Huang Zhengxuan Wu Christopher Potts Mor Geva Atticus Geiger 248 53 0 27 Feb 2024
Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals Francesco Ortu Zhijing Jin Diego Doimo Mrinmaya Sachan Alberto Cazzaniga Bernhard Schölkopf 161 30 0 18 Feb 2024
Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Buse Giledereli Jiaoda Li Yu Fei Alessandro Stolfo Wangchunshu Zhou Guangtao Zeng Antoine Bosselut Mrinmaya Sachan LRM 337 57 0 23 Oct 2023
Learning the greatest common divisor: explaining transformer predictionsInternational Conference on Learning Representations (ICLR), 2023 Franccois Charton 225 27 0 29 Aug 2023
Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP Vedant Palit Rohan Pandey Aryaman Arora Paul Pu Liang 204 43 0 27 Aug 2023
Linearity of Relation Decoding in Transformer Language ModelsInternational Conference on Learning Representations (ICLR), 2023 Evan Hernandez Arnab Sen Sharma Tal Haklay Kevin Meng Martin Wattenberg Jacob Andreas Yonatan Belinkov David Bau KELM 283 130 0 17 Aug 2023
Revisiting invariances and introducing priors in Gromov-Wasserstein distances Pinar Demetci Quang-Huy Tran I. Redko Ritambhara Singh OT 140 1 0 19 Jul 2023
The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural NetworksNeural Information Processing Systems (NeurIPS), 2023 Ziqian Zhong Ziming Liu Max Tegmark Jacob Andreas 202 132 0 30 Jun 2023
Similarity of Neural Network Models: A Survey of Functional and Representational MeasuresACM Computing Surveys (ACM Comput. Surv.), 2023 Max Klabunde Tobias Schumacher M. Strohmaier Florian Lemmerich 417 102 0 10 May 2023
ZipIt! Merging Models from Different Tasks without TrainingInternational Conference on Learning Representations (ICLR), 2023 George Stoica Daniel Bolya J. Bjorner Pratik Ramesh Taylor N. Hearn Judy Hoffman VLM MoMe 350 158 0 04 May 2023
Towards Automated Circuit Discovery for Mechanistic InterpretabilityNeural Information Processing Systems (NeurIPS), 2023 Arthur Conmy Augustine N. Mavor-Parker Aengus Lynch Stefan Heimersheim Adrià Garriga-Alonso 413 428 0 28 Apr 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Mor Geva Jasmijn Bastings Katja Filippova Amir Globerson KELM 573 407 0 28 Apr 2023
TRAK: Attributing Model Behavior at ScaleInternational Conference on Machine Learning (ICML), 2023 Sung Min Park Kristian Georgiev Andrew Ilyas Guillaume Leclerc Aleksander Madry TDI 320 223 0 24 Mar 2023
Progress measures for grokking via mechanistic interpretabilityInternational Conference on Learning Representations (ICLR), 2023 Neel Nanda Lawrence Chan Tom Lieberum Jess Smith Jacob Steinhardt 334 603 0 12 Jan 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 smallInternational Conference on Learning Representations (ICLR), 2022 Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 508 747 0 01 Nov 2022
On the Origins of the Block Structure Phenomenon in Neural Network Representations Thao Nguyen M. Raghu Simon Kornblith 147 14 0 15 Feb 2022
Locating and Editing Factual Associations in GPTNeural Information Processing Systems (NeurIPS), 2022 Kevin Meng David Bau A. Andonian Yonatan Belinkov KELM 831 1,858 0 10 Feb 2022
Differentiable Subset Pruning of Transformer HeadsTransactions of the Association for Computational Linguistics (TACL), 2021 Jiaoda Li Robert Bamler Mrinmaya Sachan 270 62 0 10 Aug 2021
Grounding Representation Similarity with Statistical Testing Frances Ding Jean-Stanislas Denain Jacob Steinhardt 176 32 0 03 Aug 2021
Model Compression Using Optimal Transport Suhas Lohit Michael J. Jones 181 9 0 07 Dec 2020
Model Fusion via Optimal TransportNeural Information Processing Systems (NeurIPS), 2019 Sidak Pal Singh Martin Jaggi MoMe FedML 502 282 0 12 Oct 2019
The Shape of Data: Intrinsic Distance for Data DistributionsInternational Conference on Learning Representations (ICLR), 2019 Anton Tsitsulin Marina Munkhoeva Davide Mottin Panagiotis Karras A. Bronstein Ivan Oseledets Emmanuel Müller 188 57 0 27 May 2019
Similarity of Neural Network Representations RevisitedInternational Conference on Machine Learning (ICML), 2019 Simon Kornblith Mohammad Norouzi Honglak Lee Geoffrey E. Hinton 1.0K 1,714 0 01 May 2019
Representation Similarity Analysis for Efficient Task taxonomy & Transfer Learning Kshitij Dwivedi Gemma Roig 154 163 0 26 Apr 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 2.8K 106,430 0 11 Oct 2018
Insights on representational similarity in neural networks with canonical correlation Ari S. Morcos M. Raghu Samy Bengio DRL 301 479 0 14 Jun 2018
Optimal Transport for structured data with application on graphs Titouan Vayer Laetitia Chapel Rémi Flamary R. Tavenard Nicolas Courty OT 229 306 0 23 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 1.6K 7,907 0 20 Apr 2018
Attention Is All You NeedNeural Information Processing Systems (NeurIPS), 2017 Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 2.4K 157,232 0 12 Jun 2017
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 3.5K 214,123 0 10 Dec 2015
Convergent Learning: Do different neural networks learn the same representations? Shouqing Yang J. Yosinski Jeff Clune Hod Lipson John E. Hopcroft SSL 257 395 0 24 Nov 2015
Sinkhorn Distances: Lightspeed Computation of Optimal Transportation DistancesNeural Information Processing Systems (NeurIPS), 2013 Marco Cuturi OT 652 4,829 0 04 Jun 2013
A Tutorial on Spectral ClusteringStatistics and computing (Stat. Comput.), 2007 U. V. Luxburg 548 10,975 0 01 Nov 2007

All Papers

Identifying Sub-networks in Neural Networks via Functionally Similar Representations

Papers citing "Identifying Sub-networks in Neural Networks via Functionally Similar Representations"