v1v2 (latest)

Learning to Deceive with Attention-Based Explanations

Annual Meeting of the Association for Computational Linguistics (ACL), 2019

17 September 2019

Graham Neubig

Papers citing "Learning to Deceive with Attention-Based Explanations"

50 / 109 papers shown

Bridging Fairness and Explainability: Can Input-Based Explanations Promote Fairness in Hate Speech Detection?

281

26 Sep 2025

Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models

509

01 Apr 2025

B-cos LM: Efficiently Transforming Pre-trained Language Models for Improved Explainability

338

18 Feb 2025

Regularization, Semi-supervision, and Supervision for a Plausible Attention-Based ExplanationInternational Conference on Applications of Natural Language to Data Bases (NLDB), 2025

348

22 Jan 2025

Explanation Regularisation through the Lens of Attributions

Pedro Ferreira

Wilker Aziz

Ivan Titov

611

23 Jul 2024

They Look Like Each Other: Case-based Reasoning for Explainable Depression Detection on Twitter using Large Language Models

Mohammad Saeid Mahdavinejad

Peyman Adibi

A. Monadjemi

Pascal Hitzler

354

21 Jul 2024

Validating Mechanistic Interpretations: An Axiomatic Approach

378

18 Jul 2024

InternalInspector

I^2

: Robust Confidence Estimation in LLMs through Internal States

Ming Jin

Lifu Huang

300

17 Jun 2024

PEACH: Pretrained-embedding Explanation Across Contextual and Hierarchical Structure

Feiqi Cao

S. Han

Hyunsuk Chung

337

21 Apr 2024

Towards a Framework for Evaluating Explanations in Automated Fact Verification

Neema Kotonya

Francesca Toni

327

29 Mar 2024

From Explainable to Interpretable Deep Learning for Natural Language Processing in Healthcare: How Far from Reality?Computational and Structural Biotechnology Journal (CSBJ), 2024

350

18 Mar 2024

RORA: Robust Free-Text Rationale Evaluation

Daniel Khashabi

315

28 Feb 2024

CMA-R:Causal Mediation Analysis for Explaining Rumour Detection

Lin Tian

Xiuzhen Zhang

Jey Han Lau

317

13 Feb 2024

SoK: Taming the Triangle -- On the Interplays between Fairness, Interpretability and Privacy in Machine Learning

362

22 Dec 2023

Interpretability Illusions in the Generalization of Simplified Models

399

06 Dec 2023

How Well Do Feature-Additive Explainers Explain Feature-Additive Predictors?

Zachariah Carmichael

Walter J. Scheirer

FAtt

304

27 Oct 2023

REFER: An End-to-end Rationale Extraction Framework for Explanation RegularizationConference on Computational Natural Language Learning (CoNLL), 2023

Mohammad Reza Ghasemi Madani

Pasquale Minervini

312

22 Oct 2023

Make Your Decision Convincing! A Unified Two-Stage Framework: Self-Attribution and Decision-Making

205

20 Oct 2023

Why bother with geometry? On the relevance of linear decompositions of Transformer embeddingsBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023

Timothee Mickus

Ananda Sreenidhi

257

10 Oct 2023

Evaluating Explanation Methods for Vision-and-Language NavigationEuropean Conference on Artificial Intelligence (ECAI), 2023

Jia Pan

290

10 Oct 2023

Towards Better Chain-of-Thought Prompting Strategies: A Survey

505

08 Oct 2023

ViT-ReciproCAM: Gradient and Attention-Free Visual Explanations for Vision Transformer

Seokhyun Byun

Won-Jo Lee

FAtt

261

04 Oct 2023

Goodhart's Law Applies to NLP's Explanation BenchmarksFindings (Findings), 2023

Jennifer Hsia

Danish Pruthi

Aarti Singh

Zachary Chase Lipton

259

28 Aug 2023

Decoding Layer Saliency in Language TransformersInternational Conference on Machine Learning (ICML), 2023

Elizabeth M. Hou

Greg Castañón

MILM

342

09 Aug 2023

R-Cut: Enhancing Explainability in Vision Transformers with Relationship Weighted Out and CutItalian National Conference on Sensors (INS), 2023

Ming Ding

184

18 Jul 2023

A Novel Counterfactual Data Augmentation Method for Aspect-Based Sentiment AnalysisAsian Conference on Machine Learning (ACML), 2023

252

20 Jun 2023

Genomic Interpreter: A Hierarchical Genomic Deep Neural Network with 1D Shifted Window Transformer

272

08 Jun 2023

Robust Natural Language Understanding with Residual Attention DebiasingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Fei Wang

202

28 May 2023

Explaining How Transformers Use Context to Build PredictionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

196

21 May 2023

COCKATIEL: COntinuous Concept ranKed ATtribution with Interpretable ELements for explaining neural net classifiers on NLP tasksAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

295

11 May 2023

Faithful Chain-of-Thought ReasoningInternational Joint Conference on Natural Language Processing (IJCNLP), 2023

Marianna Apidianaki

640

366

31 Jan 2023

Tensions Between the Proxies of Human Values in AI

243

14 Dec 2022

MEGAN: Multi-Explanation Graph Attention Network

232

23 Nov 2022

ViT-CX: Causal Explanation of Vision TransformersInternational Joint Conference on Artificial Intelligence (IJCAI), 2022

429

06 Nov 2022

Salience Allocation as Guidance for Abstractive SummarizationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Fei Wang

Wenlin Yao

206

22 Oct 2022

Beyond Model Interpretability: On the Faithfulness and Adversarial Robustness of Contrastive Textual ExplanationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Julia El Zini

M. Awad

AAML

237

17 Oct 2022

StyLEx: Explaining Style Using Human Lexical AnnotationsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022

Shirley Anugrah Hayati

436

14 Oct 2022

On the Explainability of Natural Language Processing Deep ModelsACM Computing Surveys (ACM CSUR), 2022

Julia El Zini

M. Awad

312

119

13 Oct 2022

Explanations, Fairness, and Appropriate Reliance in Human-AI Decision-MakingInternational Conference on Human Factors in Computing Systems (CHI), 2022

552

23 Sep 2022

Towards Faithful Model Explanation in NLP: A SurveyComputational Linguistics (CL), 2022

Qing Lyu

Marianna Apidianaki

Chris Callison-Burch

XAI

640

189

22 Sep 2022

Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine TranslationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022

Nuno M. Guerreiro

Elena Voita

André F. T. Martins

HILM

367

10 Aug 2022

Interpretable by Design: Learning Predictors by Composing Interpretable QueriesIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

307

03 Jul 2022

How to Dissect a Muppet: The Structure of Transformer Embedding SpacesTransactions of the Association for Computational Linguistics (TACL), 2022

Timothee Mickus

Denis Paperno

Mathieu Constant

313

07 Jun 2022

On the Relationship Between Explanations, Fairness Perceptions, and Decisions

305

27 Apr 2022

Grad-SAM: Explaining Transformers via Gradient Self-Attention MapsInternational Conference on Information and Knowledge Management (CIKM), 2021

275

23 Apr 2022

The Risks of Machine Learning Systems

Samson Tan

Araz Taeihagh

K. Baxter

170

21 Apr 2022

ProtoTEx: Explaining Model Decisions with Prototype TensorsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Matthew Lease

234

11 Apr 2022

Interpretation of Black Box NLP Models: A Survey

255

31 Mar 2022

Measuring the Mixing of Contextual Information in the TransformerConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Javier Ferrando

Gerard I. Gállego

Marta R. Costa-jussá

374

08 Mar 2022

Hierarchical Interpretation of Neural Text ClassificationComputational Linguistics (CL), 2022

Hanqi Yan

Lin Gui

Yulan He

396

20 Feb 2022