Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1711.11279
Cited By
Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
30 November 2017
Been Kim
Martin Wattenberg
Justin Gilmer
Carrie J. Cai
James Wexler
F. Viégas
Rory Sayres
FAtt
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)"
50 / 1,045 papers shown
Title
Re-evaluating Theory of Mind evaluation in large language models
Jennifer Hu
Felix Sosa
T. Ullman
42
0
0
28 Feb 2025
Obtaining Example-Based Explanations from Deep Neural Networks
Genghua Dong
Henrik Bostrom
Michalis Vazirgiannis
Roman Bresson
TDI
FAtt
XAI
98
0
0
27 Feb 2025
Interpreting CLIP with Hierarchical Sparse Autoencoders
Vladimir Zaigrajew
Hubert Baniecki
P. Biecek
49
0
0
27 Feb 2025
QPM: Discrete Optimization for Globally Interpretable Image Classification
Thomas Norrenbrock
T. Kaiser
Sovan Biswas
R. Manuvinakurike
Bodo Rosenhahn
55
0
0
27 Feb 2025
Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models
Itay Benou
Tammy Riklin-Raviv
67
0
0
27 Feb 2025
BarkXAI: A Lightweight Post-Hoc Explainable Method for Tree Species Classification with Quantifiable Concepts
Yunmei Huang
Songlin Hou
Zachary Nelson Horve
Songlin Fei
69
0
0
26 Feb 2025
Can LLMs Explain Themselves Counterfactually?
Zahra Dehghanighobadi
Asja Fischer
Muhammad Bilal Zafar
LRM
38
0
0
25 Feb 2025
NeurFlow: Interpreting Neural Networks through Neuron Groups and Functional Interactions
Tue Cao
Nhat X. Hoang
Hieu H. Pham
P. Nguyen
My T. Thai
83
0
0
22 Feb 2025
Language Models Can Predict Their Own Behavior
Dhananjay Ashok
Jonathan May
ReLM
AI4TS
LRM
58
0
0
18 Feb 2025
Abstraction Alignment: Comparing Model-Learned and Human-Encoded Conceptual Relationships
Angie Boggust
Hyemin Bang
Hendrik Strobelt
Arvindmani Satyanarayan
65
1
0
17 Feb 2025
SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models
Z. He
Haiyan Zhao
Yiran Qiao
Fan Yang
Ali Payani
Jing Ma
Mengnan Du
LLMSV
68
2
0
17 Feb 2025
From Text to Trust: Empowering AI-assisted Decision Making with Adaptive LLM-powered Analysis
Zhuoyan Li
Hangxiao Zhu
Zhuoran Lu
Ziang Xiao
Ming Yin
47
0
0
17 Feb 2025
Suboptimal Shapley Value Explanations
Xiaolei Lu
FAtt
65
0
0
17 Feb 2025
Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models
Samuel Stevens
Wei-Lun Chao
T. Berger-Wolf
Yu-Chuan Su
VLM
72
2
0
10 Feb 2025
Sample-efficient Learning of Concepts with Theoretical Guarantees: from Data to Concepts without Interventions
H. Fokkema
T. Erven
Sara Magliacane
67
1
0
10 Feb 2025
Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment
Harrish Thasarathan
Julian Forsyth
Thomas Fel
M. Kowal
Konstantinos G. Derpanis
111
7
0
06 Feb 2025
CoRPA: Adversarial Image Generation for Chest X-rays Using Concept Vector Perturbations and Generative Models
Amy Rafferty
Rishi Ramaesh
Ajitha Rajan
MedIm
AAML
56
0
0
04 Feb 2025
Compositional Concept-Based Neuron-Level Interpretability for Deep Reinforcement Learning
Zeyu Jiang
Hai Huang
Xingquan Zuo
OffRL
55
0
0
02 Feb 2025
Efficient and Interpretable Neural Networks Using Complex Lehmer Transform
M. Ataei
Xiaogang Wang
34
0
0
28 Jan 2025
Faithful Counterfactual Visual Explanations (FCVE)
Bismillah Khan
Syed Ali Tariq
Tehseen Zia
Muhammad Ahsan
David Windridge
44
0
0
12 Jan 2025
Towards Counterfactual and Contrastive Explainability and Transparency of DCNN Image Classifiers
Syed Ali Tariq
Tehseen Zia
Mubeen Ghafoor
AAML
62
7
0
12 Jan 2025
COMIX: Compositional Explanations using Prototypes
S. Sivaprasad
D. Kangin
Plamen Angelov
Mario Fritz
139
0
0
10 Jan 2025
ConSim: Measuring Concept-Based Explanations' Effectiveness with Automated Simulatability
Antonin Poché
Alon Jacovi
Agustin Picard
Victor Boutin
Fanny Jourdan
37
2
0
10 Jan 2025
Interpreting Deep Neural Network-Based Receiver Under Varying Signal-To-Noise Ratios
Marko Tuononen
Dani Korpi
Ville Hautamäki
FAtt
31
1
0
10 Jan 2025
Explaining the Behavior of Black-Box Prediction Algorithms with Causal Learning
Numair Sani
Daniel Malinsky
I. Shpitser
CML
76
15
0
10 Jan 2025
Analyzing Fine-tuning Representation Shift for Multimodal LLMs Steering alignment
Pegah Khayatan
Mustafa Shukor
Jayneel Parekh
Matthieu Cord
LLMSV
41
1
0
06 Jan 2025
Label-free Concept Based Multiple Instance Learning for Gigapixel Histopathology
Susu Sun
Leslie Tessier
Frédérique Meeuwsen
Clément Grisi
Dominique van Midden
G. Litjens
Christian F. Baumgartner
24
2
0
06 Jan 2025
Accurate Explanation Model for Image Classifiers using Class Association Embedding
Ruitao Xie
Jingbang Chen
Limai Jiang
Rui Xiao
Yi-Lun Pan
Yunpeng Cai
57
4
0
31 Dec 2024
Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models
Konstantin Donhauser
Kristina Ulicna
Gemma Elyse Moran
Aditya Ravuri
Kian Kenyon-Dean
Cian Eastwood
Jason Hartford
76
0
0
20 Dec 2024
Adaptive Concept Bottleneck for Foundation Models Under Distribution Shifts
Jihye Choi
Jayaram Raghuram
Yixuan Li
Somesh Jha
108
4
0
18 Dec 2024
Concept-ROT: Poisoning Concepts in Large Language Models with Model Editing
Keltin Grimes
Marco Christiani
David Shriver
Marissa Connor
KELM
80
1
0
17 Dec 2024
Concept Learning in the Wild: Towards Algorithmic Understanding of Neural Networks
Elad Shohama
Hadar Cohena
Khalil Wattada
Havana Rikab
Dan Vilenchik
70
1
0
15 Dec 2024
UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval
Haoyu Jiang
Zhi-Qi Cheng
Gabriel Moreira
Jiawen Zhu
Jingdong Sun
Bukun Ren
Jun-Yan He
Qi Dai
Xian-Sheng Hua
VLM
90
0
0
14 Dec 2024
OMENN: One Matrix to Explain Neural Networks
Adam Wróbel
Mikołaj Janusz
Bartosz Zieliñski
Dawid Rymarczyk
FAtt
AAML
75
0
0
03 Dec 2024
Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
Yunkai Dang
Kaichen Huang
Jiahao Huo
Yibo Yan
S. Huang
...
Kun Wang
Yong Liu
Jing Shao
Hui Xiong
Xuming Hu
LRM
101
14
0
03 Dec 2024
Explaining the Impact of Training on Vision Models via Activation Clustering
Ahcène Boubekki
Samuel G. Fadel
Sebastian Mair
89
0
0
29 Nov 2024
Revisiting Marr in Face: The Building of 2D--2.5D--3D Representations in Deep Neural Networks
Xiangyu Zhu
Chang Yu
Jiankuo Zhao
Zhaoxiang Zhang
Stan Z. Li
Zhen Lei
3DV
82
0
0
25 Nov 2024
FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Generation
Trong-Thang Pham
Ngoc-Vuong Ho
Nhat-Tan Bui
T. Phan
Patel Brijesh
...
Gianfranco Doretto
Anh Nguyen
Carol C. Wu
Hien Nguyen
Ngan Le
92
2
0
23 Nov 2024
GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers
Éloi Zablocki
Valentin Gerard
Amaia Cardiel
Eric Gaussier
Matthieu Cord
Eduardo Valle
79
0
0
23 Nov 2024
DEBUG-HD: Debugging TinyML models on-device using Hyper-Dimensional computing
Nikhil P Ghanathe
Steven J E Wilton
28
0
0
16 Nov 2024
Explainable Artificial Intelligence for Medical Applications: A Review
Qiyang Sun
Alican Akman
Björn Schuller
81
6
0
15 Nov 2024
Towards Utilising a Range of Neural Activations for Comprehending Representational Associations
Laura O'Mahony
Nikola S. Nikolov
David JP O'Sullivan
28
0
0
15 Nov 2024
Classification with Conceptual Safeguards
Hailey Joren
Charles Marx
Berk Ustun
37
2
0
07 Nov 2024
Local vs distributed representations: What is the right basis for interpretability?
Julien Colin
L. Goetschalckx
Thomas Fel
Victor Boutin
Jay Gopal
Thomas Serre
Nuria Oliver
HAI
34
2
0
06 Nov 2024
Decision Trees for Interpretable Clusters in Mixture Models and Deep Representations
Maximilian Fleissner
Maedeh Zarvandi
D. Ghoshdastidar
29
1
0
03 Nov 2024
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models
Aashiq Muhamed
Mona Diab
Virginia Smith
38
2
0
01 Nov 2024
Beyond Accuracy: Ensuring Correct Predictions With Correct Rationales
Tang Li
Mengmeng Ma
Xi Peng
37
2
0
31 Oct 2024
All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling
Emanuele Marconato
Sébastien Lachapelle
Sebastian Weichwald
Luigi Gresele
66
3
0
30 Oct 2024
Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Transformers
Shaobo Wang
Hongxuan Tang
Mingyang Wang
H. Zhang
Xuyang Liu
Weiya Li
Xuming Hu
Linfeng Zhang
17
0
0
29 Oct 2024
Bridging the Gap between Expert and Language Models: Concept-guided Chess Commentary Generation and Evaluation
Jaechang Kim
Jinmin Goh
Inseok Hwang
Jaewoong Cho
Jungseul Ok
ELM
28
1
0
28 Oct 2024
Previous
1
2
3
4
5
...
19
20
21
Next