The elephant in the interpretability room: Why use attention as explanation when we have saliency methods?

12 October 2020

Papers citing "The elephant in the interpretability room: Why use attention as explanation when we have saliency methods?"

50 / 99 papers shown

Concise and Sufficient Sub-Sentence Citations for Retrieval-Augmented Generation

199

25 Sep 2025

Cross-Attention is Half Explanation in Speech-to-Text Models

181

22 Sep 2025

SalaMAnder: Shapley-based Mathematical Expression Attribution and Metric for Chain-of-Thought Reasoning

146

20 Sep 2025

Attribution-guided Pruning for Compression, Circuit Discovery, and Targeted Correction in LLMs

Sayed Mohammad Vakilzadeh Hatefi

269

16 Jun 2025

On the reliability of feature attribution methods for speech classification

424

22 May 2025

The Atlas of In-Context Learning: How Attention Heads Shape In-Context Retrieval Augmentation

388

21 May 2025

Unveiling Knowledge Utilization Mechanisms in LLM-based Retrieval-Augmented GenerationAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025

232

17 May 2025

Enabling Global, Human-Centered Explanations for LLMs:From Tokens to Interpretable Code and Test Generation

David Nader-Palacio

Dipin Khati

Daniel Rodríguez-Cárdenas

Alejandro Velasco

Denys Poshyvanyk

LRM

338

21 Mar 2025

Regularization, Semi-supervision, and Supervision for a Plausible Attention-Based ExplanationInternational Conference on Applications of Natural Language to Data Bases (NLDB), 2025

291

22 Jan 2025

Normalized AOPC: Fixing Misleading Faithfulness Metrics for Feature Attribution ExplainabilityAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Joakim Edin

Andreas Geert Motzfeldt

Casper L. Christensen

Tuukka Ruotsalo

Lars Maaløe

Maria Maistro

485

15 Aug 2024

Validating Mechanistic Interpretations: An Axiomatic Approach

314

18 Jul 2024

A look under the hood of the Interactive Deep Learning Enterprise (No-IDLE)

313

27 Jun 2024

Interpretability Needs a New Paradigm

Andreas Madsen

Himabindu Lakkaraju

Siva Reddy

Sarath Chandar

211

08 May 2024

Unraveling the Dilemma of AI Errors: Exploring the Effectiveness of Human and Machine Explanations for Large Language Models

Marvin Pafla

Kate Larson

Mark Hancock

230

11 Apr 2024

LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models

Igor Tufanov

267

10 Apr 2024

On the Faithfulness of Vision Transformer Explanations

Yan Yan

294

01 Apr 2024

Towards Explainability in Legal Outcome Prediction Models

Josef Valvoda

Robert Bamler

ELM AILaw

321

25 Mar 2024

Comparing Explanation Faithfulness between Multilingual and Monolingual Fine-tuned Language Models

Zhixue Zhao

Nikolaos Aletras

259

19 Mar 2024

Detecting Hallucination and Coverage Errors in Retrieval Augmented Generation for Controversial TopicsInternational Conference on Language Resources and Evaluation (LREC), 2024

Kathleen Meier-Hellstern

Lucas Dixon

316

13 Mar 2024

Information Flow Routes: Automatically Interpreting Language Models at Scale

Javier Ferrando

Elena Voita

395

27 Feb 2024

Attention Meets Post-hoc Interpretability: A Mathematical PerspectiveInternational Conference on Machine Learning (ICML), 2024

Gianluigi Lopardo

F. Precioso

Damien Garreau

267

05 Feb 2024

Approximate Attributions for Off-the-Shelf Siamese TransformersConference of the European Chapter of the Association for Computational Linguistics (EACL), 2024

Lucas Moller

Dmitry Nikolaev

Sebastian Padó

268

05 Feb 2024

ReAGent: A Model-agnostic Feature Attribution Method for Generative Language Models

Zhixue Zhao

Boxuan Shan

343

01 Feb 2024

XAI for In-hospital Mortality Prediction via Multimodal ICU Data

Bo Du

189

29 Dec 2023

Attribution and Alignment: Effects of Local Context Repetition on Utterance Production and Comprehension in DialogueConference on Computational Natural Language Learning (CoNLL), 2023

259

21 Nov 2023

Sum-of-Parts: Self-Attributing Neural Networks with End-to-End Learning of Feature Groups

439

25 Oct 2023

REFER: An End-to-end Rationale Extraction Framework for Explanation RegularizationConference on Computational Natural Language Learning (CoNLL), 2023

Mohammad Reza Ghasemi Madani

Pasquale Minervini

246

22 Oct 2023

An Interpretable Deep-Learning Framework for Predicting Hospital Readmissions From Electronic Health Records

402

16 Oct 2023

An Attribution Method for Siamese EncodersConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Lucas Moller

Dmitry Nikolaev

Sebastian Padó

408

09 Oct 2023

Quantifying the Plausibility of Context Reliance in Neural Machine TranslationInternational Conference on Learning Representations (ICLR), 2023

316

02 Oct 2023

Attention Sorting Combats Recency Bias In Long Context Language Models

A. Peysakhovich

Adam Lerer

LRM RALM

339

28 Sep 2023

Exploring Different Levels of Supervision for Detecting and Localizing Solar Panels on Remote Sensing Imagery

Maarten Burger

R. Wijnhoven

Shaodi You

207

19 Sep 2023

Unsupervised Text Style Transfer with Deep Generative Models

Zhongtao Jiang

Yuanzhe Zhang

Yiming Ju

Kang Liu

282

31 Aug 2023

Decoding Layer Saliency in Language TransformersInternational Conference on Machine Learning (ICML), 2023

Elizabeth M. Hou

Greg Castañón

MILM

291

09 Aug 2023

ALens: An Adaptive Domain-Oriented Abstract Writing Training Tool for Novice Researchers

246

08 Aug 2023

Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation ExtractionAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Haotian Chen

Bingsheng Chen

Xiangdong Zhou

282

20 Jun 2023

B-cos Alignment for Inherently Interpretable CNNs and Vision TransformersIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Moritz D Boehle

Navdeeppal Singh

Mario Fritz

Bernt Schiele

405

19 Jun 2023

Using Sequences of Life-events to Predict Human LivesNature Computational Science (Nat. Comput. Sci.), 2023

259

05 Jun 2023

DecompX: Explaining Transformers Decisions by Propagating Token DecompositionAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Ali Modarressi

Mohsen Fayyaz

Ehsan Aghazadeh

Yadollah Yaghoobzadeh

Mohammad Taher Pilehvar

324

05 Jun 2023

AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap

Q. V. Liao

J. Vaughan

413

244

02 Jun 2023

HalOmi: A Manually Annotated Benchmark for Multilingual Hallucination and Omission Detection in Machine TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

397

19 May 2023

Incorporating Attribution Importance for Improving Faithfulness MetricsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Zhixue Zhao

Nikolaos Aletras

407

17 May 2023

AD-KD: Attribution-Driven Knowledge Distillation for Language Model CompressionAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

412

17 May 2023

ConvXAI: Delivering Heterogeneous AI Explanations via Conversations to Support Human-AI Scientific Writing

Hua Shen

Huang Chieh-Yang

Tongshuang Wu

Ting-Hao 'Kenneth' Huang

506

16 May 2023

Dissecting Recall of Factual Associations in Auto-Regressive Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

843

438

28 Apr 2023

Evaluating self-attention interpretability through human-grounded experimental protocol

193

27 Mar 2023

Holistically Explainable Vision Transformers

Moritz D Boehle

Mario Fritz

Bernt Schiele

ViT

313

20 Jan 2023

Opti-CAM: Optimizing saliency maps for interpretabilityComputer Vision and Image Understanding (CVIU), 2023

563

17 Jan 2023

DExT: Detector Explanation Toolkit

Deepan Padmanabhan

Paul G. Plöger

Octavio Arriaga

Matias Valdenegro-Toro

221

21 Dec 2022

Human-Guided Fair Classification for Natural Language ProcessingInternational Conference on Learning Representations (ICLR), 2022

Martin Vechev

296

20 Dec 2022