Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks

5 October 2020

Papers citing "Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks"

50 / 70 papers shown

Title
An XAI-based Analysis of Shortcut Learning in Neural Networks Phuong Quynh Le Jorg Schlotterer Christin Seifert AAML 29 0 0 22 Apr 2025
MIB: A Mechanistic Interpretability Benchmark Aaron Mueller Atticus Geiger Sarah Wiegreffe Dana Arad Iván Arcuschin ... Alessandro Stolfo Martin Tutek Amir Zur David Bau Yonatan Belinkov 43 1 0 17 Apr 2025
Are formal and functional linguistic mechanisms dissociated in language models? Michael Hanna Sandro Pezzelle Yonatan Belinkov 47 0 0 14 Mar 2025
Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution Shichang Zhang Tessa Han Usha Bhalla Hima Lakkaraju FAtt 147 0 0 17 Feb 2025
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models Julie Kallini Shikhar Murty Christopher D. Manning Christopher Potts Róbert Csordás 34 2 0 28 Oct 2024
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models Philipp Mondorf Sondre Wold Barbara Plank 34 0 0 02 Oct 2024
Breaking Neural Network Scaling Laws with Modularity Akhilan Boopathy Sunshine Jiang William Yue Jaedong Hwang Abhiram Iyer Ila Fiete OOD 39 2 0 09 Sep 2024
Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small Maheep Chaudhary Atticus Geiger 26 13 0 05 Sep 2024
Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations Róbert Csordás Christopher Potts Christopher D. Manning Atticus Geiger GAN 28 15 0 20 Aug 2024
Block-Operations: Using Modular Routing to Improve Compositional Generalization Florian Dietz Dietrich Klakow AI4CE 19 0 0 01 Aug 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs Nitay Calderon Roi Reichart 38 10 0 27 Jul 2024
Tracking linguistic information in transformer-based sentence embeddings through targeted sparsification Vivi Nastase Paola Merlo 28 2 0 25 Jul 2024
Out of spuriousity: Improving robustness to spurious correlations without group annotations Phuong Quynh Le Jorg Schlotterer Christin Seifert CML OOD 37 2 0 20 Jul 2024
How and where does CLIP process negation? Vincent Quantmeyer Pablo Mosteiro Albert Gatt CoGe 29 6 0 15 Jul 2024
Functional Faithfulness in the Wild: Circuit Discovery with Differentiable Computation Graph Pruning Lei Yu Jingcheng Niu Zining Zhu Gerald Penn 38 5 0 04 Jul 2024
Are there identifiable structural parts in the sentence embedding whole? Vivi Nastase Paola Merlo 32 3 0 24 Jun 2024
METRIK: Measurement-Efficient Randomized Controlled Trials using Transformers with Input Masking S. Lala Niraj K. Jha 16 0 0 24 Jun 2024
When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full Models Ting-Yun Chang Jesse Thomason Robin Jia 40 4 0 19 Jun 2024
Interpretability of Language Models via Task Spaces Lucas Weber Jaap Jumelet Elia Bruni Dieuwke Hupkes 35 4 0 10 Jun 2024
Debiasing surgeon: fantastic weights and how to find them Rémi Nahon Ivan Luiz De Moura Matos Van-Tam Nguyen Enzo Tartaglione 36 1 0 21 Mar 2024
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations Jing-ling Huang Zhengxuan Wu Christopher Potts Mor Geva Atticus Geiger 59 26 0 27 Feb 2024
Assessing the Portability of Parameter Matrices Trained by Parameter-Efficient Finetuning Methods Mohammed Sabry Anya Belz 31 0 0 25 Jan 2024
A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments Zhengxuan Wu Atticus Geiger Jing-ling Huang Aryaman Arora Thomas F. Icard Christopher Potts Noah D. Goodman 33 6 0 23 Jan 2024
Compressing Image-to-Image Translation GANs Using Local Density Structures on Their Learned Manifold Alireza Ganjdanesh Shangqian Gao Hirad Alipanah Heng-Chiao Huang GAN 27 6 0 22 Dec 2023
Examining Modularity in Multilingual LMs via Language-Specialized Subnetworks Rochelle Choenni Ekaterina Shutova Daniel H Garrette 27 8 0 14 Nov 2023
Uncovering Intermediate Variables in Transformers using Circuit Probing Michael A. Lepori Thomas Serre Ellie Pavlick 75 7 0 07 Nov 2023
Winning Prize Comes from Losing Tickets: Improve Invariant Learning by Exploring Variant Parameters for Out-of-Distribution Generalization Zhuo Huang Muyang Li Li Shen Jun-chen Yu Chen Gong Bo Han Tongliang Liu OOD 38 8 0 25 Oct 2023
Visually Grounded Continual Language Learning with Selective Specialization Kyra Ahrens Lennart Bengtson Jae Hee Lee Stefan Wermter 19 0 0 24 Oct 2023
Unlocking Emergent Modularity in Large Language Models Zihan Qiu Zeyu Huang Jie Fu 22 8 0 17 Oct 2023
Instilling Inductive Biases with Subnetworks Enyan Zhang Michael A. Lepori Ellie Pavlick AI4CE 8 4 0 17 Oct 2023
Discovering Knowledge-Critical Subnetworks in Pretrained Language Models Deniz Bayazit Negar Foroutan Zeming Chen Gail Weiss Antoine Bosselut KELM 24 13 0 04 Oct 2023
Modularity in Deep Learning: A Survey Haozhe Sun Isabelle Guyon MoMe 30 2 0 02 Oct 2023
NeuroSurgeon: A Toolkit for Subnetwork Analysis Michael A. Lepori Ellie Pavlick Thomas Serre 11 7 0 01 Sep 2023
Differentiable Weight Masks for Domain Transfer Samarth Khanna Skanda Vaidyanath Akash Velu 21 0 0 26 Aug 2023
Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained Vision-Language Models Kecheng Zheng Wei Wu Ruili Feng Kai Zhu Jiawei Liu Deli Zhao Zhengjun Zha Wei Chen Yujun Shen VLM 19 8 0 27 Jul 2023
Causal interventions expose implicit situation models for commonsense language understanding Takateru Yamakoshi James L. McClelland A. Goldberg Robert D. Hawkins 17 6 0 06 Jun 2023
Soft Merging of Experts with Adaptive Routing Mohammed Muqeeth Haokun Liu Colin Raffel MoMe MoE 27 45 0 06 Jun 2023
Neural Sculpting: Uncovering hierarchically modular task structure in neural networks through pruning and network analysis S. M. Patil Loizos Michael C. Dovrolis 34 0 0 28 May 2023
Emergent Modularity in Pre-trained Transformers Zhengyan Zhang Zhiyuan Zeng Yankai Lin Chaojun Xiao Xiaozhi Wang Xu Han Zhiyuan Liu Ruobing Xie Maosong Sun Jie Zhou MoE 37 23 0 28 May 2023
Similarity of Neural Network Models: A Survey of Functional and Representational Measures Max Klabunde Tobias Schumacher M. Strohmaier Florian Lemmerich 52 64 0 10 May 2023
Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability Ziming Liu Eric Gan Max Tegmark 21 36 0 04 May 2023
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations Atticus Geiger Zhengxuan Wu Christopher Potts Thomas F. Icard Noah D. Goodman CML 75 98 0 05 Mar 2023
Does Deep Learning Learn to Abstract? A Systematic Probing Framework Shengnan An Zeqi Lin B. Chen Qiang Fu Nanning Zheng Jian-Guang Lou 31 4 0 23 Feb 2023
Modular Deep Learning Jonas Pfeiffer Sebastian Ruder Ivan Vulić E. Ponti MoMe OOD 23 73 0 22 Feb 2023
Break It Down: Evidence for Structural Compositionality in Neural Networks Michael A. Lepori Thomas Serre Ellie Pavlick 33 29 0 26 Jan 2023
Towards Modular Machine Learning Solution Development: Benefits and Trade-offs Samiyuru Menik Lakshmish Ramaswamy 29 4 0 23 Jan 2023
Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models Peter Hase Mohit Bansal Been Kim Asma Ghandeharioun MILM 34 167 0 10 Jan 2023
Modularity through Attention: Efficient Training and Transfer of Language-Conditioned Policies for Robot Manipulation Yifan Zhou Shubham D. Sonawani Mariano Phielipp Simon Stepputtis H. B. Amor LM&Ro 14 27 0 08 Dec 2022
Logical Tasks for Measuring Extrapolation and Rule Comprehension Ippei Fujisawa Ryota Kanai ELM LRM 20 4 0 14 Nov 2022
Training Debiased Subnetworks with Contrastive Weight Pruning Geon Yeong Park Sangmin Lee Sang Wan Lee Jong Chul Ye CML 30 13 0 11 Oct 2022