Network Dissection: Quantifying Interpretability of Deep Visual Representations

19 April 2017

Antonio Torralba

Papers citing "Network Dissection: Quantifying Interpretability of Deep Visual Representations"

50 / 842 papers shown

Title
Mechanistic Finetuning of Vision-Language-Action Models via Few-Shot Demonstrations Chancharik Mitra Yusen Luo Raj Saravanan Dantong Niu Anirudh Pai Jesse Thomason Trevor Darrell Abrar Anwar Deva Ramanan Roei Herzig 44 0 0 27 Nov 2025
Auxiliary Metrics Help Decoding Skill Neurons in the Wild Yixiu Zhao Xiaozhi Wang Zijun Yao Lei Hou Juanzi Li 337 0 0 26 Nov 2025
Guaranteed Optimal Compositional Explanations for Neurons Biagio La Rosa Leilani H. Gilpin 68 0 0 25 Nov 2025
Open Vocabulary Compositional Explanations for Neuron Alignment Biagio La Rosa Leilani H. Gilpin OCL 302 0 0 25 Nov 2025
Interpreting GFlowNets for Drug Discovery: Extracting Actionable Insights for Medicinal Chemistry Amirtha Varshini A S Duminda S. Ranasinghe Hok Hei Tam 41 0 0 24 Nov 2025
LAYA: Layer-wise Attention Aggregation for Interpretable Depth-Aware Neural Networks Gennaro Vessio FAtt 148 0 0 16 Nov 2025
Probing the Probes: Methods and Metrics for Concept Alignment Jacob Lysnæs-Larsen Marte Eggen Inga Strümke LLMSV 148 0 0 06 Nov 2025
LLEXICORP: End-user Explainability of Convolutional Neural Networks Vojtěch Kůr Adam Bajger Adam Kukučka Marek Hradil Vít Musil Tomáš Brázdil 73 0 0 04 Nov 2025
Atlas-Alignment: Making Interpretability Transferable Across Language Models Bruno Puri J. Berend Sebastian Lapuschkin Wojciech Samek LLMSV 381 0 0 31 Oct 2025
ConceptScope: Characterizing Dataset Bias via Disentangled Visual Concepts Jinho Choi Hyesu Lim Steffen Schneider Jaegul Choo 136 0 0 30 Oct 2025
Finding Culture-Sensitive Neurons in Vision-Language Models Xiutian Zhao Rochelle Choenni Rohit Saxena Ivan Titov VLM 234 0 0 28 Oct 2025
Enhancing Pre-trained Representation Classifiability can Boost its InterpretabilityInternational Conference on Learning Representations (ICLR), 2025 Shufan Shen Zhaobo Qi Junshu Sun Qingming Huang Qi Tian Shuhui Wang FAtt 388 4 0 28 Oct 2025
A Video Is Not Worth a Thousand Words Sam Pollard Michael Wray 96 0 0 27 Oct 2025
Scaling Non-Parametric Sampling with Representation Vincent Lu Aaron Truong Zeyu Yun Yubei Chen DiffM 112 0 0 25 Oct 2025
Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent Christy Li Josep Lopez Camunas Jake Thomas Touchet Jacob Andreas Àgata Lapedriza Antonio Torralba Tamar Rott Shaham 183 0 0 24 Oct 2025
EdgeSync: Accelerating Edge-Model Updates for Data Drift through Adaptive Continuous Learning Runchu Donga Peng Zhao Guiqin Wang Nan Qi Jie Lin 84 0 0 18 Oct 2025
Neologism Learning for Controllability and Self-Verbalization John Hewitt Oyvind Tafjord Robert Geirhos Been Kim NAI 76 1 0 09 Oct 2025
Encode, Think, Decode: Scaling test-time reasoning with recursive latent thoughts Yeskendir Koishekenov Aldo Lipani Nicola Cancedda LRM 126 1 0 08 Oct 2025
Mysteries of the Deep: Role of Intermediate Representations in Out of Distribution Detection I. M. De la Jara C. Rodriguez-Opazo D. Teney D. Ranasinghe E. Abbasnejad OODD 323 0 0 07 Oct 2025
Semantic Regexes: Auto-Interpreting LLM Features with a Structured Language Angie Boggust Donghao Ren Yannick Assogba Dominik Moritz Arvind Satyanarayan Fred Hohman 120 0 0 07 Oct 2025
Take Goodhart Seriously: Principled Limit on General-Purpose AI Optimization Antoine Maier Aude Maier Tom David 88 0 0 03 Oct 2025
Attack logics, not outputs: Towards efficient robustification of deep neural networks by falsifying concept-based properties Raik Dankworth Gesina Schwalbe AAML 92 0 0 01 Oct 2025
Mechanistic Interpretability as Statistical Estimation: A Variance Analysis of EAP-IG Maxime Méloux François Portet Maxime Peyrard 159 1 0 01 Oct 2025
TextCAM: Explaining Class Activation Map with Text Qiming Zhao Xingjian Li Xiaoyu Cao Xiaolong Wu Min Xu VLM 107 0 0 01 Oct 2025
Object-Centric Case-Based Reasoning via Argumentation Gabriel de Olim Gaul Adam Gould Avinash Kori Francesca Toni 78 0 0 30 Sep 2025
Nonparametric Identification of Latent Concepts Yujia Zheng Shaoan Xie Kun Zhang 191 1 0 30 Sep 2025
Interpret, prune and distill Donut : towards lightweight VLMs for VQA on document Adnan Ben Mansour Ayoub Karine D. Naccache 112 0 0 30 Sep 2025
CE-FAM: Concept-Based Explanation via Fusion of Activation Maps Michihiro Kuroki T. Yamasaki 124 0 0 28 Sep 2025
On The Variability of Concept Activation Vectors Julia Wenkmann Damien Garreau AAML 105 0 0 28 Sep 2025
REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model Bo Li Guanzhi Deng Ronghao Chen Junrong Yue Shuo Zhang Qinghua Zhao Linqi Song Lijie Wen LRM 93 0 0 26 Sep 2025
Interpreting ResNet-based CLIP via Neuron-Attention Decomposition Edmund Bu Yossi Gandelsman 193 0 0 24 Sep 2025
Redefining Experts: Interpretable Decomposition of Language Models for Toxicity Mitigation Zuhair Hasan Shaik Abdullah Mazhar Aseem Srivastava Md. Shad Akhtar 72 0 0 20 Sep 2025
V-CECE: Visual Counterfactual Explanations via Conceptual Edits Nikolaos Spanos Maria Lymperaiou Giorgos Filandrianos Konstantinos Thomas Athanasios Voulodimos Giorgos Stamou 199 0 0 20 Sep 2025
Which Direction to Choose? An Analysis on the Representation Power of Self-Supervised ViTs in Downstream Tasks Yannis Kaltampanidis Alexandros Doumanoglou D. Zarpalas 128 0 0 18 Sep 2025
NeuroStrike: Neuron-Level Attacks on Aligned LLMs Lichao Wu Sasha Behrouzi Mohamadreza Rostami Maximilian Thang S. Picek A. Sadeghi AAML 213 1 0 15 Sep 2025
Discovering Divergent Representations between Text-to-Image Models Lisa Dunlap Joseph E. Gonzalez Trevor Darrell Fabian Caba Heilbron Josef Sivic Bryan C. Russell EGVM 120 0 0 10 Sep 2025
Superposition in Graph Neural Networks Lukas Pertl Han Xuanyuan Pietro Lio GNN 121 0 0 31 Aug 2025
GCAV: A Global Concept Activation Vector Framework for Cross-Layer Consistency in Interpretability Zhenghao He Sanchit Sinha Guangzhi Xiong Aidong Zhang 141 0 0 28 Aug 2025
NM-Hebb: Coupling Local Hebbian Plasticity with Metric Learning for More Accurate and Interpretable CNNs Davorin Miličević Ratko Grbić 84 0 0 27 Aug 2025
Disentangling Polysemantic Neurons with a Null-Calibrated Polysemanticity Index and Causal Patch Interventions Manan Gupta Dhruv Kumar MILM 64 0 0 23 Aug 2025
Evaluating Sparse Autoencoders for Monosemantic Representation Moghis Fereidouni Muhammad Umair Haider Peizhong Ju A.B. Siddique 132 0 0 20 Aug 2025
Integrating attention into explanation frameworks for language and vision transformers Marte Eggen Jacob Lysnæs-Larsen Inga Strümke 65 0 0 12 Aug 2025
Granular Concept Circuits: Toward a Fine-Grained Circuit Discovery for Concept Representations Dahee Kwon Sehyun Lee Jaesik Choi 122 1 0 03 Aug 2025
Eigen Neural Network: Unlocking Generalizable Vision with Eigenbasis Anzhe Cheng Chenzhong Yin Mingxi Cheng Shukai Duan Shahin Nazarian Paul Bogdan 199 0 0 02 Aug 2025
Detection Transformers Under the Knife: A Neuroscience-Inspired Approach to Ablations Nils Hütten Florian Hölken Hasan Tercan Tobias Meisen MedIm 144 0 0 29 Jul 2025
Compositional Function Networks: A High-Performance Alternative to Deep Neural Networks with Built-in Interpretability Fang Li 188 0 0 28 Jul 2025
Emergence of Quantised Representations Isolated to Anisotropic Functions George Bird 116 1 0 16 Jul 2025
Escaping Plato's Cave: JAM for Aligning Independently Trained Vision and Language Models Lauren Hyoseo Yoon Yisong Yue Been Kim 316 0 0 01 Jul 2025
When concept-based XAI is imprecise: Do people distinguish between generalisations and misrepresentations? Romy Müller 132 1 0 22 Jun 2025
From Concepts to Components: Concept-Agnostic Attention Module Discovery in Transformers Jingtong Su Julia Kempe Karen Ullrich 221 3 0 20 Jun 2025