Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1704.05796
Cited By
Network Dissection: Quantifying Interpretability of Deep Visual Representations
19 April 2017
David Bau
Bolei Zhou
A. Khosla
A. Oliva
Antonio Torralba
MILM
FAtt
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Network Dissection: Quantifying Interpretability of Deep Visual Representations"
50 / 842 papers shown
Title
Mechanistic Finetuning of Vision-Language-Action Models via Few-Shot Demonstrations
Chancharik Mitra
Yusen Luo
Raj Saravanan
Dantong Niu
Anirudh Pai
Jesse Thomason
Trevor Darrell
Abrar Anwar
Deva Ramanan
Roei Herzig
44
0
0
27 Nov 2025
Auxiliary Metrics Help Decoding Skill Neurons in the Wild
Yixiu Zhao
Xiaozhi Wang
Zijun Yao
Lei Hou
Juanzi Li
337
0
0
26 Nov 2025
Guaranteed Optimal Compositional Explanations for Neurons
Biagio La Rosa
Leilani H. Gilpin
68
0
0
25 Nov 2025
Open Vocabulary Compositional Explanations for Neuron Alignment
Biagio La Rosa
Leilani H. Gilpin
OCL
302
0
0
25 Nov 2025
Interpreting GFlowNets for Drug Discovery: Extracting Actionable Insights for Medicinal Chemistry
Amirtha Varshini A S
Duminda S. Ranasinghe
Hok Hei Tam
41
0
0
24 Nov 2025
LAYA: Layer-wise Attention Aggregation for Interpretable Depth-Aware Neural Networks
Gennaro Vessio
FAtt
148
0
0
16 Nov 2025
Probing the Probes: Methods and Metrics for Concept Alignment
Jacob Lysnæs-Larsen
Marte Eggen
Inga Strümke
LLMSV
148
0
0
06 Nov 2025
LLEXICORP: End-user Explainability of Convolutional Neural Networks
Vojtěch Kůr
Adam Bajger
Adam Kukučka
Marek Hradil
Vít Musil
Tomáš Brázdil
73
0
0
04 Nov 2025
Atlas-Alignment: Making Interpretability Transferable Across Language Models
Bruno Puri
J. Berend
Sebastian Lapuschkin
Wojciech Samek
LLMSV
381
0
0
31 Oct 2025
ConceptScope: Characterizing Dataset Bias via Disentangled Visual Concepts
Jinho Choi
Hyesu Lim
Steffen Schneider
Jaegul Choo
136
0
0
30 Oct 2025
Finding Culture-Sensitive Neurons in Vision-Language Models
Xiutian Zhao
Rochelle Choenni
Rohit Saxena
Ivan Titov
VLM
234
0
0
28 Oct 2025
Enhancing Pre-trained Representation Classifiability can Boost its Interpretability
International Conference on Learning Representations (ICLR), 2025
Shufan Shen
Zhaobo Qi
Junshu Sun
Qingming Huang
Qi Tian
Shuhui Wang
FAtt
388
4
0
28 Oct 2025
A Video Is Not Worth a Thousand Words
Sam Pollard
Michael Wray
96
0
0
27 Oct 2025
Scaling Non-Parametric Sampling with Representation
Vincent Lu
Aaron Truong
Zeyu Yun
Yubei Chen
DiffM
112
0
0
25 Oct 2025
Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent
Christy Li
Josep Lopez Camunas
Jake Thomas Touchet
Jacob Andreas
Àgata Lapedriza
Antonio Torralba
Tamar Rott Shaham
183
0
0
24 Oct 2025
EdgeSync: Accelerating Edge-Model Updates for Data Drift through Adaptive Continuous Learning
Runchu Donga
Peng Zhao
Guiqin Wang
Nan Qi
Jie Lin
84
0
0
18 Oct 2025
Neologism Learning for Controllability and Self-Verbalization
John Hewitt
Oyvind Tafjord
Robert Geirhos
Been Kim
NAI
76
1
0
09 Oct 2025
Encode, Think, Decode: Scaling test-time reasoning with recursive latent thoughts
Yeskendir Koishekenov
Aldo Lipani
Nicola Cancedda
LRM
126
1
0
08 Oct 2025
Mysteries of the Deep: Role of Intermediate Representations in Out of Distribution Detection
I. M. De la Jara
C. Rodriguez-Opazo
D. Teney
D. Ranasinghe
E. Abbasnejad
OODD
323
0
0
07 Oct 2025
Semantic Regexes: Auto-Interpreting LLM Features with a Structured Language
Angie Boggust
Donghao Ren
Yannick Assogba
Dominik Moritz
Arvind Satyanarayan
Fred Hohman
120
0
0
07 Oct 2025
Take Goodhart Seriously: Principled Limit on General-Purpose AI Optimization
Antoine Maier
Aude Maier
Tom David
88
0
0
03 Oct 2025
Attack logics, not outputs: Towards efficient robustification of deep neural networks by falsifying concept-based properties
Raik Dankworth
Gesina Schwalbe
AAML
92
0
0
01 Oct 2025
Mechanistic Interpretability as Statistical Estimation: A Variance Analysis of EAP-IG
Maxime Méloux
François Portet
Maxime Peyrard
159
1
0
01 Oct 2025
TextCAM: Explaining Class Activation Map with Text
Qiming Zhao
Xingjian Li
Xiaoyu Cao
Xiaolong Wu
Min Xu
VLM
107
0
0
01 Oct 2025
Object-Centric Case-Based Reasoning via Argumentation
Gabriel de Olim Gaul
Adam Gould
Avinash Kori
Francesca Toni
78
0
0
30 Sep 2025
Nonparametric Identification of Latent Concepts
Yujia Zheng
Shaoan Xie
Kun Zhang
191
1
0
30 Sep 2025
Interpret, prune and distill Donut : towards lightweight VLMs for VQA on document
Adnan Ben Mansour
Ayoub Karine
D. Naccache
112
0
0
30 Sep 2025
CE-FAM: Concept-Based Explanation via Fusion of Activation Maps
Michihiro Kuroki
T. Yamasaki
124
0
0
28 Sep 2025
On The Variability of Concept Activation Vectors
Julia Wenkmann
Damien Garreau
AAML
105
0
0
28 Sep 2025
REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model
Bo Li
Guanzhi Deng
Ronghao Chen
Junrong Yue
Shuo Zhang
Qinghua Zhao
Linqi Song
Lijie Wen
LRM
93
0
0
26 Sep 2025
Interpreting ResNet-based CLIP via Neuron-Attention Decomposition
Edmund Bu
Yossi Gandelsman
193
0
0
24 Sep 2025
Redefining Experts: Interpretable Decomposition of Language Models for Toxicity Mitigation
Zuhair Hasan Shaik
Abdullah Mazhar
Aseem Srivastava
Md. Shad Akhtar
72
0
0
20 Sep 2025
V-CECE: Visual Counterfactual Explanations via Conceptual Edits
Nikolaos Spanos
Maria Lymperaiou
Giorgos Filandrianos
Konstantinos Thomas
Athanasios Voulodimos
Giorgos Stamou
199
0
0
20 Sep 2025
Which Direction to Choose? An Analysis on the Representation Power of Self-Supervised ViTs in Downstream Tasks
Yannis Kaltampanidis
Alexandros Doumanoglou
D. Zarpalas
128
0
0
18 Sep 2025
NeuroStrike: Neuron-Level Attacks on Aligned LLMs
Lichao Wu
Sasha Behrouzi
Mohamadreza Rostami
Maximilian Thang
S. Picek
A. Sadeghi
AAML
213
1
0
15 Sep 2025
Discovering Divergent Representations between Text-to-Image Models
Lisa Dunlap
Joseph E. Gonzalez
Trevor Darrell
Fabian Caba Heilbron
Josef Sivic
Bryan C. Russell
EGVM
120
0
0
10 Sep 2025
Superposition in Graph Neural Networks
Lukas Pertl
Han Xuanyuan
Pietro Lio
GNN
121
0
0
31 Aug 2025
GCAV: A Global Concept Activation Vector Framework for Cross-Layer Consistency in Interpretability
Zhenghao He
Sanchit Sinha
Guangzhi Xiong
Aidong Zhang
141
0
0
28 Aug 2025
NM-Hebb: Coupling Local Hebbian Plasticity with Metric Learning for More Accurate and Interpretable CNNs
Davorin Miličević
Ratko Grbić
84
0
0
27 Aug 2025
Disentangling Polysemantic Neurons with a Null-Calibrated Polysemanticity Index and Causal Patch Interventions
Manan Gupta
Dhruv Kumar
MILM
64
0
0
23 Aug 2025
Evaluating Sparse Autoencoders for Monosemantic Representation
Moghis Fereidouni
Muhammad Umair Haider
Peizhong Ju
A.B. Siddique
132
0
0
20 Aug 2025
Integrating attention into explanation frameworks for language and vision transformers
Marte Eggen
Jacob Lysnæs-Larsen
Inga Strümke
65
0
0
12 Aug 2025
Granular Concept Circuits: Toward a Fine-Grained Circuit Discovery for Concept Representations
Dahee Kwon
Sehyun Lee
Jaesik Choi
122
1
0
03 Aug 2025
Eigen Neural Network: Unlocking Generalizable Vision with Eigenbasis
Anzhe Cheng
Chenzhong Yin
Mingxi Cheng
Shukai Duan
Shahin Nazarian
Paul Bogdan
199
0
0
02 Aug 2025
Detection Transformers Under the Knife: A Neuroscience-Inspired Approach to Ablations
Nils Hütten
Florian Hölken
Hasan Tercan
Tobias Meisen
MedIm
144
0
0
29 Jul 2025
Compositional Function Networks: A High-Performance Alternative to Deep Neural Networks with Built-in Interpretability
Fang Li
188
0
0
28 Jul 2025
Emergence of Quantised Representations Isolated to Anisotropic Functions
George Bird
116
1
0
16 Jul 2025
Escaping Plato's Cave: JAM for Aligning Independently Trained Vision and Language Models
Lauren Hyoseo Yoon
Yisong Yue
Been Kim
316
0
0
01 Jul 2025
When concept-based XAI is imprecise: Do people distinguish between generalisations and misrepresentations?
Romy Müller
132
1
0
22 Jun 2025
From Concepts to Components: Concept-Agnostic Attention Module Discovery in Transformers
Jingtong Su
Julia Kempe
Karen Ullrich
221
3
0
20 Jun 2025
1
2
3
4
...
15
16
17
Next