Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)

30 November 2017

Justin Gilmer

Papers citing "Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)"

50 / 1,045 papers shown

Title
Re-evaluating Theory of Mind evaluation in large language models Jennifer Hu Felix Sosa T. Ullman 42 0 0 28 Feb 2025
Obtaining Example-Based Explanations from Deep Neural Networks Genghua Dong Henrik Bostrom Michalis Vazirgiannis Roman Bresson TDI FAtt XAI 98 0 0 27 Feb 2025
Interpreting CLIP with Hierarchical Sparse Autoencoders Vladimir Zaigrajew Hubert Baniecki P. Biecek 49 0 0 27 Feb 2025
QPM: Discrete Optimization for Globally Interpretable Image Classification Thomas Norrenbrock T. Kaiser Sovan Biswas R. Manuvinakurike Bodo Rosenhahn 55 0 0 27 Feb 2025
Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models Itay Benou Tammy Riklin-Raviv 67 0 0 27 Feb 2025
BarkXAI: A Lightweight Post-Hoc Explainable Method for Tree Species Classification with Quantifiable Concepts Yunmei Huang Songlin Hou Zachary Nelson Horve Songlin Fei 69 0 0 26 Feb 2025
Can LLMs Explain Themselves Counterfactually? Zahra Dehghanighobadi Asja Fischer Muhammad Bilal Zafar LRM 38 0 0 25 Feb 2025
NeurFlow: Interpreting Neural Networks through Neuron Groups and Functional Interactions Tue Cao Nhat X. Hoang Hieu H. Pham P. Nguyen My T. Thai 83 0 0 22 Feb 2025
Language Models Can Predict Their Own Behavior Dhananjay Ashok Jonathan May ReLM AI4TS LRM 58 0 0 18 Feb 2025
Abstraction Alignment: Comparing Model-Learned and Human-Encoded Conceptual Relationships Angie Boggust Hyemin Bang Hendrik Strobelt Arvindmani Satyanarayan 65 1 0 17 Feb 2025
SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models Z. He Haiyan Zhao Yiran Qiao Fan Yang Ali Payani Jing Ma Mengnan Du LLMSV 68 2 0 17 Feb 2025
From Text to Trust: Empowering AI-assisted Decision Making with Adaptive LLM-powered Analysis Zhuoyan Li Hangxiao Zhu Zhuoran Lu Ziang Xiao Ming Yin 47 0 0 17 Feb 2025
Suboptimal Shapley Value Explanations Xiaolei Lu FAtt 65 0 0 17 Feb 2025
Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models Samuel Stevens Wei-Lun Chao T. Berger-Wolf Yu-Chuan Su VLM 72 2 0 10 Feb 2025
Sample-efficient Learning of Concepts with Theoretical Guarantees: from Data to Concepts without Interventions H. Fokkema T. Erven Sara Magliacane 67 1 0 10 Feb 2025
Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment Harrish Thasarathan Julian Forsyth Thomas Fel M. Kowal Konstantinos G. Derpanis 111 7 0 06 Feb 2025
CoRPA: Adversarial Image Generation for Chest X-rays Using Concept Vector Perturbations and Generative Models Amy Rafferty Rishi Ramaesh Ajitha Rajan MedIm AAML 56 0 0 04 Feb 2025
Compositional Concept-Based Neuron-Level Interpretability for Deep Reinforcement Learning Zeyu Jiang Hai Huang Xingquan Zuo OffRL 55 0 0 02 Feb 2025
Efficient and Interpretable Neural Networks Using Complex Lehmer Transform M. Ataei Xiaogang Wang 34 0 0 28 Jan 2025
Faithful Counterfactual Visual Explanations (FCVE) Bismillah Khan Syed Ali Tariq Tehseen Zia Muhammad Ahsan David Windridge 44 0 0 12 Jan 2025
Towards Counterfactual and Contrastive Explainability and Transparency of DCNN Image Classifiers Syed Ali Tariq Tehseen Zia Mubeen Ghafoor AAML 62 7 0 12 Jan 2025
COMIX: Compositional Explanations using Prototypes S. Sivaprasad D. Kangin Plamen Angelov Mario Fritz 139 0 0 10 Jan 2025
ConSim: Measuring Concept-Based Explanations' Effectiveness with Automated Simulatability Antonin Poché Alon Jacovi Agustin Picard Victor Boutin Fanny Jourdan 37 2 0 10 Jan 2025
Interpreting Deep Neural Network-Based Receiver Under Varying Signal-To-Noise Ratios Marko Tuononen Dani Korpi Ville Hautamäki FAtt 31 1 0 10 Jan 2025
Explaining the Behavior of Black-Box Prediction Algorithms with Causal Learning Numair Sani Daniel Malinsky I. Shpitser CML 76 15 0 10 Jan 2025
Analyzing Fine-tuning Representation Shift for Multimodal LLMs Steering alignment Pegah Khayatan Mustafa Shukor Jayneel Parekh Matthieu Cord LLMSV 41 1 0 06 Jan 2025
Label-free Concept Based Multiple Instance Learning for Gigapixel Histopathology Susu Sun Leslie Tessier Frédérique Meeuwsen Clément Grisi Dominique van Midden G. Litjens Christian F. Baumgartner 24 2 0 06 Jan 2025
Accurate Explanation Model for Image Classifiers using Class Association Embedding Ruitao Xie Jingbang Chen Limai Jiang Rui Xiao Yi-Lun Pan Yunpeng Cai 57 4 0 31 Dec 2024
Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models Konstantin Donhauser Kristina Ulicna Gemma Elyse Moran Aditya Ravuri Kian Kenyon-Dean Cian Eastwood Jason Hartford 76 0 0 20 Dec 2024
Adaptive Concept Bottleneck for Foundation Models Under Distribution Shifts Jihye Choi Jayaram Raghuram Yixuan Li Somesh Jha 108 4 0 18 Dec 2024
Concept-ROT: Poisoning Concepts in Large Language Models with Model Editing Keltin Grimes Marco Christiani David Shriver Marissa Connor KELM 80 1 0 17 Dec 2024
Concept Learning in the Wild: Towards Algorithmic Understanding of Neural Networks Elad Shohama Hadar Cohena Khalil Wattada Havana Rikab Dan Vilenchik 70 1 0 15 Dec 2024
UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval Haoyu Jiang Zhi-Qi Cheng Gabriel Moreira Jiawen Zhu Jingdong Sun Bukun Ren Jun-Yan He Qi Dai Xian-Sheng Hua VLM 90 0 0 14 Dec 2024
OMENN: One Matrix to Explain Neural Networks Adam Wróbel Mikołaj Janusz Bartosz Zieliñski Dawid Rymarczyk FAtt AAML 75 0 0 03 Dec 2024
Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey Yunkai Dang Kaichen Huang Jiahao Huo Yibo Yan S. Huang ... Kun Wang Yong Liu Jing Shao Hui Xiong Xuming Hu LRM 101 14 0 03 Dec 2024
Explaining the Impact of Training on Vision Models via Activation Clustering Ahcène Boubekki Samuel G. Fadel Sebastian Mair 89 0 0 29 Nov 2024
Revisiting Marr in Face: The Building of 2D--2.5D--3D Representations in Deep Neural Networks Xiangyu Zhu Chang Yu Jiankuo Zhao Zhaoxiang Zhang Stan Z. Li Zhen Lei 3DV 82 0 0 25 Nov 2024
FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Generation Trong-Thang Pham Ngoc-Vuong Ho Nhat-Tan Bui T. Phan Patel Brijesh ... Gianfranco Doretto Anh Nguyen Carol C. Wu Hien Nguyen Ngan Le 92 2 0 23 Nov 2024
GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers Éloi Zablocki Valentin Gerard Amaia Cardiel Eric Gaussier Matthieu Cord Eduardo Valle 79 0 0 23 Nov 2024
DEBUG-HD: Debugging TinyML models on-device using Hyper-Dimensional computing Nikhil P Ghanathe Steven J E Wilton 28 0 0 16 Nov 2024
Explainable Artificial Intelligence for Medical Applications: A Review Qiyang Sun Alican Akman Björn Schuller 81 6 0 15 Nov 2024
Towards Utilising a Range of Neural Activations for Comprehending Representational Associations Laura O'Mahony Nikola S. Nikolov David JP O'Sullivan 28 0 0 15 Nov 2024
Classification with Conceptual Safeguards Hailey Joren Charles Marx Berk Ustun 37 2 0 07 Nov 2024
Local vs distributed representations: What is the right basis for interpretability? Julien Colin L. Goetschalckx Thomas Fel Victor Boutin Jay Gopal Thomas Serre Nuria Oliver HAI 34 2 0 06 Nov 2024
Decision Trees for Interpretable Clusters in Mixture Models and Deep Representations Maximilian Fleissner Maedeh Zarvandi D. Ghoshdastidar 29 1 0 03 Nov 2024
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models Aashiq Muhamed Mona Diab Virginia Smith 38 2 0 01 Nov 2024
Beyond Accuracy: Ensuring Correct Predictions With Correct Rationales Tang Li Mengmeng Ma Xi Peng 37 2 0 31 Oct 2024
All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling Emanuele Marconato Sébastien Lachapelle Sebastian Weichwald Luigi Gresele 66 3 0 30 Oct 2024
Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Transformers Shaobo Wang Hongxuan Tang Mingyang Wang H. Zhang Xuyang Liu Weiya Li Xuming Hu Linfeng Zhang 17 0 0 29 Oct 2024
Bridging the Gap between Expert and Language Models: Concept-guided Chess Commentary Generation and Evaluation Jaechang Kim Jinmin Goh Inseok Hwang Jaewoong Cho Jungseul Ok ELM 28 1 0 28 Oct 2024