Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2010.12016
Cited By
Towards falsifiable interpretability research
22 October 2020
Matthew L. Leavitt
Ari S. Morcos
AAML
AI4CE
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Towards falsifiable interpretability research"
50 / 52 papers shown
STREAM (ChemBio): A Standard for Transparently Reporting Evaluations in AI Model Reports
Tegan McCaslin
Jide Alaga
Samira Nedungadi
Seth Donoughe
Tom Reed
Rishi Bommasani
Chris Painter
Luca Righetti
267
3
0
13 Aug 2025
Attribution Explanations for Deep Neural Networks: A Theoretical Perspective
Huiqi Deng
Hongbin Pei
Quanshi Zhang
Mengnan Du
FAtt
172
1
0
11 Aug 2025
Enhancing Uncertainty Estimation and Interpretability via Bayesian Non-negative Decision Layer
International Conference on Learning Representations (ICLR), 2025
Xinyue Hu
Zhibin Duan
Bo Chen
Mingyuan Zhou
UQCV
BDL
360
2
0
28 May 2025
A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i
Kola Ayonrinde
Louis Jaburi
MILM
494
4
0
01 May 2025
Compositionality Unlocks Deep Interpretable Models
Thomas Dooms
Ward Gauderis
Geraint A. Wiggins
José Oramas
FAtt
CoGe
AI4CE
222
2
0
03 Apr 2025
Mind the Gap: Bridging the Divide Between AI Aspirations and the Reality of Autonomous Characterization
Grace Guinan
Addison Salvador
Michelle A. Smeaton
Andrew Glaws
Hilary Egan
Brian C. Wyatt
Babak Anasori
K. Fiedler
M. Olszta
Steven Spurgeon
326
5
0
25 Feb 2025
Benchmarking XAI Explanations with Human-Aligned Evaluations
Rémi Kazmierczak
Steve Azzolin
Eloise Berthier
Anna Hedström
Patricia Delhomme
...
Goran Frehse
Baptiste Caramiaux
Baptiste Caramiaux
Andrea Passerini
Gianni Franchi
436
5
0
04 Nov 2024
Confident Teacher, Confident Student? A Novel User Study Design for Investigating the Didactic Potential of Explanations and their Impact on Uncertainty
Teodor Chiaburu
Frank Haußer
Felix Bießmann
178
0
0
10 Sep 2024
Auditing Local Explanations is Hard
Robi Bhattacharjee
U. V. Luxburg
LRM
MLAU
FAtt
242
5
0
18 Jul 2024
This Probably Looks Exactly Like That: An Invertible Prototypical Network
Zachariah Carmichael
Timothy Redgrave
Daniel Gonzalez Cedre
Walter J. Scheirer
BDL
316
6
0
16 Jul 2024
An Actionability Assessment Tool for Explainable AI
Ronal Singh
Tim Miller
L. Sonenberg
Eduardo Velloso
F. Vetere
Piers Howe
Paul Dourish
135
3
0
19 Jun 2024
Mechanistic Interpretability for AI Safety -- A Review
Leonard Bereska
E. Gavves
AI4CE
335
298
0
22 Apr 2024
Acoustic characterization of speech rhythm: going beyond metrics with recurrent neural networks
Franccois Deloche
Laurent Bonnasse-Gahot
Judit Gervain
98
0
0
22 Jan 2024
Artificial Neural Nets and the Representation of Human Concepts
Timo Freiesleben
NAI
341
4
0
08 Dec 2023
On the Relationship Between Interpretability and Explainability in Machine Learning
Benjamin Leblanc
Pascal Germain
FaML
412
1
0
20 Nov 2023
Training Dynamics of Contextual N-Grams in Language Models
Lucia Quirke
Lovis Heindrich
Wes Gurnee
Neel Nanda
242
6
0
01 Nov 2023
Explainable Artificial Intelligence (XAI) 2.0: A Manifesto of Open Challenges and Interdisciplinary Research Directions
Information Fusion (Inf. Fusion), 2023
Luca Longo
Mario Brcic
Federico Cabitza
Jaesik Choi
Roberto Confalonieri
...
Andrés Páez
Wojciech Samek
Johannes Schneider
Timo Speith
Simone Stumpf
494
369
0
30 Oct 2023
How Well Do Feature-Additive Explainers Explain Feature-Additive Predictors?
Zachariah Carmichael
Walter J. Scheirer
FAtt
249
8
0
27 Oct 2023
Identifying Interpretable Visual Features in Artificial and Biological Neural Systems
David A. Klindt
Sophia Sanborn
Francisco Acosta
Frédéric Poitevin
Nina Miolane
MILM
FAtt
272
10
0
17 Oct 2023
NeuroInspect: Interpretable Neuron-based Debugging Framework through Class-conditional Visualizations
Yeong-Joon Ju
Ji-Hoon Park
Seong-Whan Lee
AAML
208
0
0
11 Oct 2023
The Blame Problem in Evaluating Local Explanations, and How to Tackle it
Amir Hossein Akhavan Rahnama
ELM
FAtt
248
7
0
05 Oct 2023
Pixel-Grounded Prototypical Part Networks
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Zachariah Carmichael
Suhas Lohit
A. Cherian
Michael Jeffrey Jones
Walter J. Scheirer
301
17
0
25 Sep 2023
The Hydra Effect: Emergent Self-repair in Language Model Computations
Tom McGrath
Matthew Rahtz
János Kramár
Vladimir Mikulik
Shane Legg
MILM
LRM
219
91
0
28 Jul 2023
Scale Alone Does not Improve Mechanistic Interpretability in Vision Models
Neural Information Processing Systems (NeurIPS), 2023
Roland S. Zimmermann
Thomas Klein
Wieland Brendel
281
23
0
11 Jul 2023
Don't trust your eyes: on the (un)reliability of feature visualizations
International Conference on Machine Learning (ICML), 2023
Robert Geirhos
Roland S. Zimmermann
Blair Bilodeau
Wieland Brendel
Been Kim
FAtt
OOD
452
36
0
07 Jun 2023
Language Models Implement Simple Word2Vec-style Vector Arithmetic
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Jack Merullo
Carsten Eickhoff
Ellie Pavlick
KELM
330
85
0
25 May 2023
Causal Analysis for Robust Interpretability of Neural Networks
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Ola Ahmad
Nicolas Béreux
Loïc Baret
V. Hashemi
Freddy Lecue
CML
310
10
0
15 May 2023
Why is plausibility surprisingly problematic as an XAI criterion?
Weina Jin
Xiaoxiao Li
Ghassan Hamarneh
367
10
0
30 Mar 2023
The Representational Status of Deep Learning Models
Eamon Duede
289
3
0
21 Mar 2023
Tracr: Compiled Transformers as a Laboratory for Interpretability
Neural Information Processing Systems (NeurIPS), 2023
David Lindner
János Kramár
Sebastian Farquhar
Matthew Rahtz
Tom McGrath
Vladimir Mikulik
494
88
0
12 Jan 2023
Higher-order mutual information reveals synergistic sub-networks for multi-neuron importance
Kenzo Clauw
S. Stramaglia
Daniele Marinazzo
SSL
FAtt
192
7
0
01 Nov 2022
SoK: Explainable Machine Learning for Computer Security Applications
European Symposium on Security and Privacy (Euro S&P), 2022
A. Nadeem
D. Vos
Clinton Cao
Luca Pajola
Simon Dieck
Robert Baumgartner
S. Verwer
362
63
0
22 Aug 2022
Do We Need Another Explainable AI Method? Toward Unifying Post-hoc XAI Evaluation Methods into an Interactive and Multi-dimensional Benchmark
Mohamed Karim Belaid
Eyke Hüllermeier
Maximilian Rabus
Ralf Krestel
ELM
175
0
0
08 Jun 2022
Additive MIL: Intrinsically Interpretable Multiple Instance Learning for Pathology
Neural Information Processing Systems (NeurIPS), 2022
Syed Ashar Javed
Dinkar Juyal
Harshith Padigela
A. Taylor-Weiner
Limin Yu
Aaditya (Adi) Prakash
203
85
0
03 Jun 2022
Attribution-based Explanations that Provide Recourse Cannot be Robust
Journal of machine learning research (JMLR), 2022
H. Fokkema
R. D. Heide
T. Erven
FAtt
366
22
0
31 May 2022
Features of Explainability: How users understand counterfactual and causal explanations for categorical and continuous features in XAI
Greta Warren
Mark T. Keane
R. Byrne
CML
152
28
0
21 Apr 2022
An explainability framework for cortical surface-based deep learning
Fernanda L. Ribeiro
S. Bollmann
R. Cunnington
A. M. Puckett
FAtt
AAML
MedIm
135
3
0
15 Mar 2022
Investigating the fidelity of explainable artificial intelligence methods for applications of convolutional neural networks in geoscience
Artificial Intelligence for the Earth Systems (AI4ES), 2022
Antonios Mamalakis
E. Barnes
I. Ebert‐Uphoff
235
89
0
07 Feb 2022
Framework for Evaluating Faithfulness of Local Explanations
International Conference on Machine Learning (ICML), 2022
S. Dasgupta
Nave Frost
Michal Moshkovitz
FAtt
408
79
0
01 Feb 2022
From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI
ACM Computing Surveys (ACM CSUR), 2022
Meike Nauta
Jan Trienes
Shreyasi Pathak
Elisa Nguyen
Michelle Peters
Yasmin Schmitt
Jorg Schlotterer
M. V. Keulen
C. Seifert
ELM
XAI
611
561
0
20 Jan 2022
HIVE: Evaluating the Human Interpretability of Visual Explanations
European Conference on Computer Vision (ECCV), 2021
Sunnie S. Y. Kim
Nicole Meister
V. V. Ramaswamy
Ruth C. Fong
Olga Russakovsky
387
131
0
06 Dec 2021
Acquisition of Chess Knowledge in AlphaZero
Thomas McGrath
A. Kapishnikov
Nenad Tomašev
Adam Pearce
Demis Hassabis
Been Kim
Ulrich Paquet
Vladimir Kramnik
429
189
0
17 Nov 2021
Grounding Representation Similarity with Statistical Testing
Frances Ding
Jean-Stanislas Denain
Jacob Steinhardt
211
32
0
03 Aug 2021
How Well do Feature Visualizations Support Causal Understanding of CNN Activations?
Roland S. Zimmermann
Judy Borowski
Robert Geirhos
Matthias Bethge
Thomas S. A. Wallis
Wieland Brendel
FAtt
318
39
0
23 Jun 2021
The effectiveness of feature attribution methods and its correlation with automatic evaluation scores
Neural Information Processing Systems (NeurIPS), 2021
Giang Nguyen
Daeyoung Kim
Anh Totti Nguyen
FAtt
493
105
0
31 May 2021
XAI Handbook: Towards a Unified Framework for Explainable AI
Sebastián M. Palacio
Adriano Lucieri
Mohsin Munir
Jörn Hees
Sheraz Ahmed
Andreas Dengel
133
40
0
14 May 2021
Leveraging Sparse Linear Layers for Debuggable Deep Networks
International Conference on Machine Learning (ICML), 2021
Eric Wong
Shibani Santurkar
Aleksander Madry
FAtt
208
96
0
11 May 2021
Two4Two: Evaluating Interpretable Machine Learning - A Synthetic Dataset For Controlled Experiments
M. Schuessler
Philipp Weiß
Leon Sixt
147
3
0
06 May 2021
Neural Network Attribution Methods for Problems in Geoscience: A Novel Synthetic Benchmark Dataset
Environmental Data Science (EDS), 2021
Antonios Mamalakis
I. Ebert‐Uphoff
E. Barnes
OOD
357
87
0
18 Mar 2021
If Only We Had Better Counterfactual Explanations: Five Key Deficits to Rectify in the Evaluation of Counterfactual XAI Techniques
International Joint Conference on Artificial Intelligence (IJCAI), 2021
Mark T. Keane
Eoin M. Kenny
Eoin Delaney
Barry Smyth
CML
297
165
0
26 Feb 2021
1
2
Next