v1v2 (latest)

Understanding the Role of Individual Units in a Deep Neural Network

Proceedings of the National Academy of Sciences of the United States of America (PNAS), 2020

10 September 2020

Jun-Yan Zhu

Antonio Torralba

Papers citing "Understanding the Role of Individual Units in a Deep Neural Network"

50 / 233 papers shown

Mechanistic Finetuning of Vision-Language-Action Models via Few-Shot Demonstrations

27 Nov 2025

Guaranteed Optimal Compositional Explanations for Neurons

Biagio La Rosa

Leilani H. Gilpin

25 Nov 2025

Open Vocabulary Compositional Explanations for Neuron Alignment

Biagio La Rosa

Leilani H. Gilpin

OCL

330

25 Nov 2025

Training Language Models to Explain Their Own Computations

209

11 Nov 2025

Finding Culture-Sensitive Neurons in Vision-Language Models

246

28 Oct 2025

Understanding Multi-View Transformers

28 Oct 2025

TextCAM: Explaining Class Activation Map with Text

119

01 Oct 2025

Granular Concept Circuits: Toward a Fine-Grained Circuit Discovery for Concept Representations

Dahee Kwon

Sehyun Lee

Jaesik Choi

163

03 Aug 2025

Unraveling Hidden Representations: A Multi-Modal Layer Analysis for Better Synthetic Content Forensics

Tom Or

Omri Azencot

AAML

182

01 Aug 2025

Explaining How Visual, Textual and Multimodal Encoders Share Concepts

Clément Cornet

Romaric Besançon

Hervé Le Borgne

153

24 Jul 2025

Escaping Plato's Cave: JAM for Aligning Independently Trained Vision and Language Models

Lauren Hyoseo Yoon

Yisong Yue

Been Kim

363

01 Jul 2025

From Concepts to Components: Concept-Agnostic Attention Module Discovery in Transformers

Jingtong Su

Julia Kempe

Karen Ullrich

268

20 Jun 2025

Evaluating Neuron Explanations: A Unified Framework with Sanity Checks

171

06 Jun 2025

Line of Sight: On Linear Representations in VLLMs

283

05 Jun 2025

Unconditional CNN denoisers contain sparse semantic representation of images

315

02 Jun 2025

P: A Universal Measure of Predictive Intelligence

David Gamez

ELM

104

30 May 2025

Debiasing CLIP: Interpreting and Correcting Bias in Attention Heads

277

23 May 2025

FastCAV: Efficient Computation of Concept Activation Vectors for Explaining Deep Neural Networks

175

23 May 2025

Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models

Ercong Nie

Helmut Schmid

Hinrich Schutze

365

22 May 2025

Explainable embeddings with Distance Explainer

Christiaan Meijer

E. G. Patrick Bos

369

21 May 2025

Out-of-Distribution Detection via Channelwise Feature Aggregation in Neural Network-Based Receivers

372

21 May 2025

Explaining Neural Networks with Reasons

Levin Hornischer

Hannes Leitgeb

FAtt AAML MILM

319

20 May 2025

What's Pulling the Strings? Evaluating Integrity and Attribution in AI Training and Inference through Concept Shift

319

28 Apr 2025

Weight-of-Thought Reasoning: Exploring Neural Network Weights for Enhanced LLM Reasoning

Saif Punjwani

Larry Heck

LRM

235

14 Apr 2025

Neuron-level Balance between Stability and Plasticity in Deep Reinforcement Learning

281

09 Apr 2025

Following the Whispers of Values: Unraveling Neural Mechanisms Behind Value-Oriented Behaviors in LLMs

381

07 Apr 2025

LSNet: See Large, Focus SmallComputer Vision and Pattern Recognition (CVPR), 2025

295

29 Mar 2025

Effective Skill Unlearning through Intervention and AbstentionNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

Yongce Li

Chung-En Sun

Tsui-Wei Weng

862

27 Mar 2025

CoE: Chain-of-Explanation via Automatic Visual Concept Circuit Description and Polysemanticity QuantificationComputer Vision and Pattern Recognition (CVPR), 2025

276

19 Mar 2025

Representational Similarity via Interpretable Visual ConceptsInternational Conference on Learning Representations (ICLR), 2025

976

19 Mar 2025

Post-Hoc Concept Disentanglement: From Correlated to Isolated Concept Representations

250

07 Mar 2025

Superscopes: Amplifying Internal Feature Representations for Language Model Interpretation

Jonathan Jacobi

Gal Niv

LRM ReLM

421

03 Mar 2025

Steered Generation via Gradient Descent on Sparse Features

Sumanta Bhattacharyya

Pedram Rooshenas

LLMSV

287

25 Feb 2025

TinyEmo: Scaling down Emotional Reasoning via Metric Projection

Cristian Gutierrez

LRM

523

17 Feb 2025

Dimensions underlying the representational alignment of deep neural networks with humansNature Machine Intelligence (Nat. Mach. Intell.), 2024

388

28 Jan 2025

Faithful Counterfactual Visual Explanations (FCVE)Knowledge-Based Systems (KBS), 2024

232

12 Jan 2025

Towards Counterfactual and Contrastive Explainability and Transparency of DCNN Image ClassifiersKnowledge-Based Systems (KBS), 2022

306

12 Jan 2025

GPT-2 Through the Lens of Vector Symbolic Architectures

155

10 Dec 2024

Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey

...

425

03 Dec 2024

From CNN to CNN + RNN: Adapting Visualization Techniques for Time-Series Anomaly Detection

Fabien Poirier

AI4TS

242

07 Nov 2024

Probing Ranking LLMs: A Mechanistic Analysis for Information RetrievalInternational Conference on the Theory of Information Retrieval (ICTIR), 2024

Tanya Chowdhury

Atharva Nijasure

James Allan

232

24 Oct 2024

Exploiting Text-Image Latent Spaces for the Description of Visual ConceptsInternational Conference on Pattern Recognition (ICPR), 2024

180

23 Oct 2024

Neuron-based Personality Trait Induction in Large Language Models

238

16 Oct 2024

Interpreting and Editing Vision-Language Representations to Mitigate HallucinationsInternational Conference on Learning Representations (ICLR), 2024

403

03 Oct 2024

Linking in Style: Understanding learned features in deep learning modelsEuropean Conference on Computer Vision (ECCV), 2024

Maren H. Wehrheim

Pamela Osuna-Vargas

Matthias Kaschube

GAN

184

25 Sep 2024

Unveiling Language Competence Neurons: A Psycholinguistic Approach to Model InterpretabilityInternational Conference on Computational Linguistics (COLING), 2024

Xufeng Duan

Xinyu Zhou

Bei Xiao

Zhenguang G. Cai

MILM

214

24 Sep 2024

Optimal ablation for interpretabilityNeural Information Processing Systems (NeurIPS), 2024

Maximilian Li

Lucas Janson

FAtt

343

16 Sep 2024

Unveiling Markov Heads in Pretrained Language Models for Offline Reinforcement Learning

327

11 Sep 2024

How to Measure Human-AI Prediction Accuracy in Explainable AI Systems

...

207

23 Aug 2024

Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience

...

317

22 Aug 2024