ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.12016
  4. Cited By
Towards falsifiable interpretability research

Towards falsifiable interpretability research

22 October 2020
Matthew L. Leavitt
Ari S. Morcos
    AAMLAI4CE
ArXiv (abs)PDFHTML

Papers citing "Towards falsifiable interpretability research"

50 / 52 papers shown
Title
STREAM (ChemBio): A Standard for Transparently Reporting Evaluations in AI Model Reports
STREAM (ChemBio): A Standard for Transparently Reporting Evaluations in AI Model Reports
Tegan McCaslin
Jide Alaga
Samira Nedungadi
Seth Donoughe
Tom Reed
Rishi Bommasani
Chris Painter
Luca Righetti
262
2
0
13 Aug 2025
Attribution Explanations for Deep Neural Networks: A Theoretical Perspective
Attribution Explanations for Deep Neural Networks: A Theoretical Perspective
Huiqi Deng
Hongbin Pei
Quanshi Zhang
Mengnan Du
FAtt
160
1
0
11 Aug 2025
Enhancing Uncertainty Estimation and Interpretability via Bayesian Non-negative Decision Layer
Enhancing Uncertainty Estimation and Interpretability via Bayesian Non-negative Decision LayerInternational Conference on Learning Representations (ICLR), 2025
Xinyue Hu
Zhibin Duan
Bo Chen
Mingyuan Zhou
UQCVBDL
344
2
0
28 May 2025
A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i
A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i
Kola Ayonrinde
Louis Jaburi
MILM
486
3
0
01 May 2025
Compositionality Unlocks Deep Interpretable Models
Compositionality Unlocks Deep Interpretable Models
Thomas Dooms
Ward Gauderis
Geraint A. Wiggins
José Oramas
FAttCoGeAI4CE
207
2
0
03 Apr 2025
Mind the Gap: Bridging the Divide Between AI Aspirations and the Reality of Autonomous Characterization
Mind the Gap: Bridging the Divide Between AI Aspirations and the Reality of Autonomous Characterization
Grace Guinan
Addison Salvador
Michelle A. Smeaton
Andrew Glaws
Hilary Egan
Brian C. Wyatt
Babak Anasori
K. Fiedler
M. Olszta
Steven Spurgeon
315
5
0
25 Feb 2025
Benchmarking XAI Explanations with Human-Aligned Evaluations
Benchmarking XAI Explanations with Human-Aligned Evaluations
Rémi Kazmierczak
Steve Azzolin
Eloise Berthier
Anna Hedström
Patricia Delhomme
...
Goran Frehse
Baptiste Caramiaux
Baptiste Caramiaux
Andrea Passerini
Gianni Franchi
428
5
0
04 Nov 2024
Confident Teacher, Confident Student? A Novel User Study Design for
  Investigating the Didactic Potential of Explanations and their Impact on
  Uncertainty
Confident Teacher, Confident Student? A Novel User Study Design for Investigating the Didactic Potential of Explanations and their Impact on Uncertainty
Teodor Chiaburu
Frank Haußer
Felix Bießmann
174
0
0
10 Sep 2024
Auditing Local Explanations is Hard
Auditing Local Explanations is Hard
Robi Bhattacharjee
U. V. Luxburg
LRMMLAUFAtt
235
5
0
18 Jul 2024
This Probably Looks Exactly Like That: An Invertible Prototypical
  Network
This Probably Looks Exactly Like That: An Invertible Prototypical Network
Zachariah Carmichael
Timothy Redgrave
Daniel Gonzalez Cedre
Walter J. Scheirer
BDL
313
5
0
16 Jul 2024
An Actionability Assessment Tool for Explainable AI
An Actionability Assessment Tool for Explainable AI
Ronal Singh
Tim Miller
L. Sonenberg
Eduardo Velloso
F. Vetere
Piers Howe
Paul Dourish
121
2
0
19 Jun 2024
Mechanistic Interpretability for AI Safety -- A Review
Mechanistic Interpretability for AI Safety -- A Review
Leonard Bereska
E. Gavves
AI4CE
332
293
0
22 Apr 2024
Acoustic characterization of speech rhythm: going beyond metrics with
  recurrent neural networks
Acoustic characterization of speech rhythm: going beyond metrics with recurrent neural networks
Franccois Deloche
Laurent Bonnasse-Gahot
Judit Gervain
89
0
0
22 Jan 2024
Artificial Neural Nets and the Representation of Human Concepts
Artificial Neural Nets and the Representation of Human Concepts
Timo Freiesleben
NAI
312
4
0
08 Dec 2023
On the Relationship Between Interpretability and Explainability in
  Machine Learning
On the Relationship Between Interpretability and Explainability in Machine Learning
Benjamin Leblanc
Pascal Germain
FaML
398
1
0
20 Nov 2023
Training Dynamics of Contextual N-Grams in Language Models
Training Dynamics of Contextual N-Grams in Language Models
Lucia Quirke
Lovis Heindrich
Wes Gurnee
Neel Nanda
234
6
0
01 Nov 2023
Explainable Artificial Intelligence (XAI) 2.0: A Manifesto of Open
  Challenges and Interdisciplinary Research Directions
Explainable Artificial Intelligence (XAI) 2.0: A Manifesto of Open Challenges and Interdisciplinary Research DirectionsInformation Fusion (Inf. Fusion), 2023
Luca Longo
Mario Brcic
Federico Cabitza
Jaesik Choi
Roberto Confalonieri
...
Andrés Páez
Wojciech Samek
Johannes Schneider
Timo Speith
Simone Stumpf
474
357
0
30 Oct 2023
How Well Do Feature-Additive Explainers Explain Feature-Additive
  Predictors?
How Well Do Feature-Additive Explainers Explain Feature-Additive Predictors?
Zachariah Carmichael
Walter J. Scheirer
FAtt
246
8
0
27 Oct 2023
Identifying Interpretable Visual Features in Artificial and Biological
  Neural Systems
Identifying Interpretable Visual Features in Artificial and Biological Neural Systems
David A. Klindt
Sophia Sanborn
Francisco Acosta
Frédéric Poitevin
Nina Miolane
MILMFAtt
267
10
0
17 Oct 2023
NeuroInspect: Interpretable Neuron-based Debugging Framework through
  Class-conditional Visualizations
NeuroInspect: Interpretable Neuron-based Debugging Framework through Class-conditional Visualizations
Yeong-Joon Ju
Ji-Hoon Park
Seong-Whan Lee
AAML
205
0
0
11 Oct 2023
The Blame Problem in Evaluating Local Explanations, and How to Tackle it
The Blame Problem in Evaluating Local Explanations, and How to Tackle it
Amir Hossein Akhavan Rahnama
ELMFAtt
241
7
0
05 Oct 2023
Pixel-Grounded Prototypical Part Networks
Pixel-Grounded Prototypical Part NetworksIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Zachariah Carmichael
Suhas Lohit
A. Cherian
Michael Jeffrey Jones
Walter J. Scheirer
297
16
0
25 Sep 2023
The Hydra Effect: Emergent Self-repair in Language Model Computations
The Hydra Effect: Emergent Self-repair in Language Model Computations
Tom McGrath
Matthew Rahtz
János Kramár
Vladimir Mikulik
Shane Legg
MILMLRM
214
91
0
28 Jul 2023
Scale Alone Does not Improve Mechanistic Interpretability in Vision
  Models
Scale Alone Does not Improve Mechanistic Interpretability in Vision ModelsNeural Information Processing Systems (NeurIPS), 2023
Roland S. Zimmermann
Thomas Klein
Wieland Brendel
264
23
0
11 Jul 2023
Don't trust your eyes: on the (un)reliability of feature visualizations
Don't trust your eyes: on the (un)reliability of feature visualizationsInternational Conference on Machine Learning (ICML), 2023
Robert Geirhos
Roland S. Zimmermann
Blair Bilodeau
Wieland Brendel
Been Kim
FAttOOD
452
36
0
07 Jun 2023
Language Models Implement Simple Word2Vec-style Vector Arithmetic
Language Models Implement Simple Word2Vec-style Vector ArithmeticNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Jack Merullo
Carsten Eickhoff
Ellie Pavlick
KELM
285
85
0
25 May 2023
Causal Analysis for Robust Interpretability of Neural Networks
Causal Analysis for Robust Interpretability of Neural NetworksIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Ola Ahmad
Nicolas Béreux
Loïc Baret
V. Hashemi
Freddy Lecue
CML
301
10
0
15 May 2023
Why is plausibility surprisingly problematic as an XAI criterion?
Why is plausibility surprisingly problematic as an XAI criterion?
Weina Jin
Xiaoxiao Li
Ghassan Hamarneh
347
10
0
30 Mar 2023
The Representational Status of Deep Learning Models
The Representational Status of Deep Learning Models
Eamon Duede
272
3
0
21 Mar 2023
Tracr: Compiled Transformers as a Laboratory for Interpretability
Tracr: Compiled Transformers as a Laboratory for InterpretabilityNeural Information Processing Systems (NeurIPS), 2023
David Lindner
János Kramár
Sebastian Farquhar
Matthew Rahtz
Tom McGrath
Vladimir Mikulik
474
87
0
12 Jan 2023
Higher-order mutual information reveals synergistic sub-networks for
  multi-neuron importance
Higher-order mutual information reveals synergistic sub-networks for multi-neuron importance
Kenzo Clauw
S. Stramaglia
Daniele Marinazzo
SSLFAtt
192
8
0
01 Nov 2022
SoK: Explainable Machine Learning for Computer Security Applications
SoK: Explainable Machine Learning for Computer Security ApplicationsEuropean Symposium on Security and Privacy (Euro S&P), 2022
A. Nadeem
D. Vos
Clinton Cao
Luca Pajola
Simon Dieck
Robert Baumgartner
S. Verwer
342
63
0
22 Aug 2022
Do We Need Another Explainable AI Method? Toward Unifying Post-hoc XAI
  Evaluation Methods into an Interactive and Multi-dimensional Benchmark
Do We Need Another Explainable AI Method? Toward Unifying Post-hoc XAI Evaluation Methods into an Interactive and Multi-dimensional Benchmark
Mohamed Karim Belaid
Eyke Hüllermeier
Maximilian Rabus
Ralf Krestel
ELM
161
0
0
08 Jun 2022
Additive MIL: Intrinsically Interpretable Multiple Instance Learning for
  Pathology
Additive MIL: Intrinsically Interpretable Multiple Instance Learning for PathologyNeural Information Processing Systems (NeurIPS), 2022
Syed Ashar Javed
Dinkar Juyal
Harshith Padigela
A. Taylor-Weiner
Limin Yu
Aaditya (Adi) Prakash
188
85
0
03 Jun 2022
Attribution-based Explanations that Provide Recourse Cannot be Robust
Attribution-based Explanations that Provide Recourse Cannot be RobustJournal of machine learning research (JMLR), 2022
H. Fokkema
R. D. Heide
T. Erven
FAtt
350
22
0
31 May 2022
Features of Explainability: How users understand counterfactual and
  causal explanations for categorical and continuous features in XAI
Features of Explainability: How users understand counterfactual and causal explanations for categorical and continuous features in XAI
Greta Warren
Mark T. Keane
R. Byrne
CML
145
27
0
21 Apr 2022
An explainability framework for cortical surface-based deep learning
An explainability framework for cortical surface-based deep learning
Fernanda L. Ribeiro
S. Bollmann
R. Cunnington
A. M. Puckett
FAttAAMLMedIm
134
3
0
15 Mar 2022
Investigating the fidelity of explainable artificial intelligence
  methods for applications of convolutional neural networks in geoscience
Investigating the fidelity of explainable artificial intelligence methods for applications of convolutional neural networks in geoscienceArtificial Intelligence for the Earth Systems (AI4ES), 2022
Antonios Mamalakis
E. Barnes
I. Ebert‐Uphoff
224
88
0
07 Feb 2022
Framework for Evaluating Faithfulness of Local Explanations
Framework for Evaluating Faithfulness of Local ExplanationsInternational Conference on Machine Learning (ICML), 2022
S. Dasgupta
Nave Frost
Michal Moshkovitz
FAtt
400
78
0
01 Feb 2022
From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic
  Review on Evaluating Explainable AI
From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AIACM Computing Surveys (ACM CSUR), 2022
Meike Nauta
Jan Trienes
Shreyasi Pathak
Elisa Nguyen
Michelle Peters
Yasmin Schmitt
Jorg Schlotterer
M. V. Keulen
C. Seifert
ELMXAI
563
555
0
20 Jan 2022
HIVE: Evaluating the Human Interpretability of Visual Explanations
HIVE: Evaluating the Human Interpretability of Visual ExplanationsEuropean Conference on Computer Vision (ECCV), 2021
Sunnie S. Y. Kim
Nicole Meister
V. V. Ramaswamy
Ruth C. Fong
Olga Russakovsky
382
130
0
06 Dec 2021
Acquisition of Chess Knowledge in AlphaZero
Acquisition of Chess Knowledge in AlphaZero
Thomas McGrath
A. Kapishnikov
Nenad Tomašev
Adam Pearce
Demis Hassabis
Been Kim
Ulrich Paquet
Vladimir Kramnik
405
188
0
17 Nov 2021
Grounding Representation Similarity with Statistical Testing
Grounding Representation Similarity with Statistical Testing
Frances Ding
Jean-Stanislas Denain
Jacob Steinhardt
200
32
0
03 Aug 2021
How Well do Feature Visualizations Support Causal Understanding of CNN
  Activations?
How Well do Feature Visualizations Support Causal Understanding of CNN Activations?
Roland S. Zimmermann
Judy Borowski
Robert Geirhos
Matthias Bethge
Thomas S. A. Wallis
Wieland Brendel
FAtt
292
39
0
23 Jun 2021
The effectiveness of feature attribution methods and its correlation
  with automatic evaluation scores
The effectiveness of feature attribution methods and its correlation with automatic evaluation scoresNeural Information Processing Systems (NeurIPS), 2021
Giang Nguyen
Daeyoung Kim
Anh Totti Nguyen
FAtt
463
104
0
31 May 2021
XAI Handbook: Towards a Unified Framework for Explainable AI
XAI Handbook: Towards a Unified Framework for Explainable AI
Sebastián M. Palacio
Adriano Lucieri
Mohsin Munir
Jörn Hees
Sheraz Ahmed
Andreas Dengel
125
40
0
14 May 2021
Leveraging Sparse Linear Layers for Debuggable Deep Networks
Leveraging Sparse Linear Layers for Debuggable Deep NetworksInternational Conference on Machine Learning (ICML), 2021
Eric Wong
Shibani Santurkar
Aleksander Madry
FAtt
188
96
0
11 May 2021
Two4Two: Evaluating Interpretable Machine Learning - A Synthetic Dataset
  For Controlled Experiments
Two4Two: Evaluating Interpretable Machine Learning - A Synthetic Dataset For Controlled Experiments
M. Schuessler
Philipp Weiß
Leon Sixt
147
3
0
06 May 2021
Neural Network Attribution Methods for Problems in Geoscience: A Novel
  Synthetic Benchmark Dataset
Neural Network Attribution Methods for Problems in Geoscience: A Novel Synthetic Benchmark DatasetEnvironmental Data Science (EDS), 2021
Antonios Mamalakis
I. Ebert‐Uphoff
E. Barnes
OOD
343
87
0
18 Mar 2021
If Only We Had Better Counterfactual Explanations: Five Key Deficits to
  Rectify in the Evaluation of Counterfactual XAI Techniques
If Only We Had Better Counterfactual Explanations: Five Key Deficits to Rectify in the Evaluation of Counterfactual XAI TechniquesInternational Joint Conference on Artificial Intelligence (IJCAI), 2021
Mark T. Keane
Eoin M. Kenny
Eoin Delaney
Barry Smyth
CML
278
165
0
26 Feb 2021
12
Next