Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1909.07913
Cited By
v1
v2 (latest)
Learning to Deceive with Attention-Based Explanations
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
17 September 2019
Danish Pruthi
Mansi Gupta
Bhuwan Dhingra
Graham Neubig
Zachary Chase Lipton
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Learning to Deceive with Attention-Based Explanations"
50 / 109 papers shown
Bridging Fairness and Explainability: Can Input-Based Explanations Promote Fairness in Hate Speech Detection?
Yifan Wang
Mayank Jobanputra
Ji-Ung Lee
Soyoung Oh
Isabel Valera
Vera Demberg
281
1
0
26 Sep 2025
Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models
Guy Kaplan
Michael Toker
Yuval Reif
Yonatan Belinkov
Roy Schwartz
DiffM
508
2
0
01 Apr 2025
B-cos LM: Efficiently Transforming Pre-trained Language Models for Improved Explainability
Yifan Wang
Sukrut Rao
Ji-Ung Lee
Mayank Jobanputra
Vera Demberg
337
0
0
18 Feb 2025
Regularization, Semi-supervision, and Supervision for a Plausible Attention-Based Explanation
International Conference on Applications of Natural Language to Data Bases (NLDB), 2025
Duc Hau Nguyen
Cyrielle Mallart
Guillaume Gravier
Pascale Sébillot
345
1
0
22 Jan 2025
Explanation Regularisation through the Lens of Attributions
Pedro Ferreira
Wilker Aziz
Ivan Titov
611
2
0
23 Jul 2024
They Look Like Each Other: Case-based Reasoning for Explainable Depression Detection on Twitter using Large Language Models
Mohammad Saeid Mahdavinejad
Peyman Adibi
A. Monadjemi
Pascal Hitzler
350
1
0
21 Jul 2024
Validating Mechanistic Interpretations: An Axiomatic Approach
Nils Palumbo
Ravi Mangal
Zifan Wang
Saranya Vijayakumar
Corina S. Pasareanu
Somesh Jha
378
1
0
18 Jul 2024
InternalInspector
I
2
I^2
I
2
: Robust Confidence Estimation in LLMs through Internal States
Mohammad Beigi
Ying Shen
Runing Yang
Zihao Lin
Qifan Wang
Ankith Mohan
Jianfeng He
Ming Jin
Chang-Tien Lu
Lifu Huang
HILM
300
23
0
17 Jun 2024
PEACH: Pretrained-embedding Explanation Across Contextual and Hierarchical Structure
Feiqi Cao
S. Han
Hyunsuk Chung
334
0
0
21 Apr 2024
Towards a Framework for Evaluating Explanations in Automated Fact Verification
Neema Kotonya
Francesca Toni
326
9
0
29 Mar 2024
From Explainable to Interpretable Deep Learning for Natural Language Processing in Healthcare: How Far from Reality?
Computational and Structural Biotechnology Journal (CSBJ), 2024
Guangming Huang
Yingya Li
Shoaib Jameel
Yunfei Long
G. Papanastasiou
350
47
0
18 Mar 2024
RORA: Robust Free-Text Rationale Evaluation
Zhengping Jiang
Yining Lu
Hanjie Chen
Daniel Khashabi
Benjamin Van Durme
Anqi Liu
315
7
0
28 Feb 2024
CMA-R:Causal Mediation Analysis for Explaining Rumour Detection
Lin Tian
Xiuzhen Zhang
Jey Han Lau
317
0
0
13 Feb 2024
SoK: Taming the Triangle -- On the Interplays between Fairness, Interpretability and Privacy in Machine Learning
Julien Ferry
Ulrich Aïvodji
Sébastien Gambs
Marie-José Huguet
Mohamed Siala
FaML
362
7
0
22 Dec 2023
Interpretability Illusions in the Generalization of Simplified Models
Dan Friedman
Andrew Kyle Lampinen
Lucas Dixon
Danqi Chen
Asma Ghandeharioun
399
20
0
06 Dec 2023
How Well Do Feature-Additive Explainers Explain Feature-Additive Predictors?
Zachariah Carmichael
Walter J. Scheirer
FAtt
304
9
0
27 Oct 2023
REFER: An End-to-end Rationale Extraction Framework for Explanation Regularization
Conference on Computational Natural Language Learning (CoNLL), 2023
Mohammad Reza Ghasemi Madani
Pasquale Minervini
310
5
0
22 Oct 2023
Make Your Decision Convincing! A Unified Two-Stage Framework: Self-Attribution and Decision-Making
Yanrui Du
Sendong Zhao
Hao Wang
Yuhan Chen
Rui Bai
Zewen Qiang
Muzhen Cai
Bing Qin
205
1
0
20 Oct 2023
Why bother with geometry? On the relevance of linear decompositions of Transformer embeddings
BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023
Timothee Mickus
Ananda Sreenidhi
257
3
0
10 Oct 2023
Evaluating Explanation Methods for Vision-and-Language Navigation
European Conference on Artificial Intelligence (ECAI), 2023
Guanqi Chen
Lei Yang
Guanhua Chen
Jia Pan
XAI
290
1
0
10 Oct 2023
Towards Better Chain-of-Thought Prompting Strategies: A Survey
Zihan Yu
Liang He
Zhen Wu
Xinyu Dai
Jiajun Chen
LRM
502
90
0
08 Oct 2023
ViT-ReciproCAM: Gradient and Attention-Free Visual Explanations for Vision Transformer
Seokhyun Byun
Won-Jo Lee
FAtt
261
10
0
04 Oct 2023
Goodhart's Law Applies to NLP's Explanation Benchmarks
Findings (Findings), 2023
Jennifer Hsia
Danish Pruthi
Aarti Singh
Zachary Chase Lipton
259
8
0
28 Aug 2023
Decoding Layer Saliency in Language Transformers
International Conference on Machine Learning (ICML), 2023
Elizabeth M. Hou
Greg Castañón
MILM
342
4
0
09 Aug 2023
R-Cut: Enhancing Explainability in Vision Transformers with Relationship Weighted Out and Cut
Italian National Conference on Sensors (INS), 2023
Yingjie Niu
Ming Ding
Maoning Ge
Robin Karlsson
Yuxiao Zhang
K. Takeda
ViT
184
6
0
18 Jul 2023
A Novel Counterfactual Data Augmentation Method for Aspect-Based Sentiment Analysis
Asian Conference on Machine Learning (ACML), 2023
Dongming Wu
Lulu Wen
Chao Chen
Zhaoshu Shi
252
6
0
20 Jun 2023
Genomic Interpreter: A Hierarchical Genomic Deep Neural Network with 1D Shifted Window Transformer
Zehui Li
Akashaditya Das
W. Beardall
Yiren Zhao
Guy-Bart Stan
272
6
0
08 Jun 2023
Robust Natural Language Understanding with Residual Attention Debiasing
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Fei Wang
James Y. Huang
Tianyi Yan
Wenxuan Zhou
Muhao Chen
202
13
0
28 May 2023
Explaining How Transformers Use Context to Build Predictions
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Javier Ferrando
Gerard I. Gállego
Ioannis Tsiamas
Marta R. Costa-jussá
196
54
0
21 May 2023
COCKATIEL: COntinuous Concept ranKed ATtribution with Interpretable ELements for explaining neural net classifiers on NLP tasks
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Fanny Jourdan
Agustin Picard
Thomas Fel
Laurent Risser
Jean-Michel Loubes
Nicholas M. Asher
295
17
0
11 May 2023
Faithful Chain-of-Thought Reasoning
International Joint Conference on Natural Language Processing (IJCNLP), 2023
Qing Lyu
Shreya Havaldar
Adam Stein
Li Zhang
D. Rao
Eric Wong
Marianna Apidianaki
Chris Callison-Burch
ReLM
LRM
640
366
0
31 Jan 2023
Tensions Between the Proxies of Human Values in AI
Teresa Datta
D. Nissani
Max Cembalest
Akash Khanna
Haley Massa
John P. Dickerson
243
4
0
14 Dec 2022
MEGAN: Multi-Explanation Graph Attention Network
Jonas Teufel
Luca Torresi
Patrick Reiser
Pascal Friederich
232
9
0
23 Nov 2022
ViT-CX: Causal Explanation of Vision Transformers
International Joint Conference on Artificial Intelligence (IJCAI), 2022
Weiyan Xie
Xiao-hui Li
Caleb Chen Cao
Nevin L.Zhang
ViT
429
38
0
06 Nov 2022
Salience Allocation as Guidance for Abstractive Summarization
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Fei Wang
Kaiqiang Song
Hongming Zhang
Lifeng Jin
Sangwoo Cho
Wenlin Yao
Xiaoyang Wang
Muhao Chen
Dong Yu
206
43
0
22 Oct 2022
Beyond Model Interpretability: On the Faithfulness and Adversarial Robustness of Contrastive Textual Explanations
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Julia El Zini
M. Awad
AAML
237
2
0
17 Oct 2022
StyLEx: Explaining Style Using Human Lexical Annotations
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Shirley Anugrah Hayati
Kyumin Park
Dheeraj Rajagopal
Lyle Ungar
Luan Tuyen Chau
436
3
0
14 Oct 2022
On the Explainability of Natural Language Processing Deep Models
ACM Computing Surveys (ACM CSUR), 2022
Julia El Zini
M. Awad
312
116
0
13 Oct 2022
Explanations, Fairness, and Appropriate Reliance in Human-AI Decision-Making
International Conference on Human Factors in Computing Systems (CHI), 2022
Jakob Schoeffer
Maria De-Arteaga
Niklas Kuehl
FaML
552
84
0
23 Sep 2022
Towards Faithful Model Explanation in NLP: A Survey
Computational Linguistics (CL), 2022
Qing Lyu
Marianna Apidianaki
Chris Callison-Burch
XAI
639
189
0
22 Sep 2022
Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine Translation
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Nuno M. Guerreiro
Elena Voita
André F. T. Martins
HILM
364
70
0
10 Aug 2022
Interpretable by Design: Learning Predictors by Composing Interpretable Queries
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Aditya Chattopadhyay
Stewart Slocum
B. Haeffele
René Vidal
D. Geman
307
33
0
03 Jul 2022
How to Dissect a Muppet: The Structure of Transformer Embedding Spaces
Transactions of the Association for Computational Linguistics (TACL), 2022
Timothee Mickus
Denis Paperno
Mathieu Constant
313
29
0
07 Jun 2022
On the Relationship Between Explanations, Fairness Perceptions, and Decisions
Jakob Schoeffer
Maria De-Arteaga
Niklas Kuehl
FaML
305
7
0
27 Apr 2022
Grad-SAM: Explaining Transformers via Gradient Self-Attention Maps
International Conference on Information and Knowledge Management (CIKM), 2021
Oren Barkan
Edan Hauon
Avi Caciularu
Ori Katz
Itzik Malkiel
Omri Armstrong
Noam Koenigstein
275
62
0
23 Apr 2022
The Risks of Machine Learning Systems
Samson Tan
Araz Taeihagh
K. Baxter
170
9
0
21 Apr 2022
ProtoTEx: Explaining Model Decisions with Prototype Tensors
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Anubrata Das
Chitrank Gupta
Venelin Kovatchev
Matthew Lease
Junjie Li
234
33
0
11 Apr 2022
Interpretation of Black Box NLP Models: A Survey
Shivani Choudhary
N. Chatterjee
S. K. Saha
FAtt
255
19
0
31 Mar 2022
Measuring the Mixing of Contextual Information in the Transformer
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Javier Ferrando
Gerard I. Gállego
Marta R. Costa-jussá
374
74
0
08 Mar 2022
Hierarchical Interpretation of Neural Text Classification
Computational Linguistics (CL), 2022
Hanqi Yan
Lin Gui
Yulan He
396
17
0
20 Feb 2022
1
2
3
Next
Page 1 of 3