Post-hoc Interpretability for Neural NLP: A Survey

10 August 2021

Siva Reddy

Papers citing "Post-hoc Interpretability for Neural NLP: A Survey"

37 / 37 papers shown

Title
Short-circuiting Shortcuts: Mechanistic Investigation of Shortcuts in Text Classification Leon Eshuijs Shihan Wang Antske Fokkens 21 0 0 09 May 2025
Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods Mahdi Dhaini Ege Erdogan Nils Feldhus Gjergji Kasneci 39 0 0 02 May 2025
Superscopes: Amplifying Internal Feature Representations for Language Model Interpretation Jonathan Jacobi Gal Niv LRM ReLM 55 0 0 03 Mar 2025
FitCF: A Framework for Automatic Feature Importance-guided Counterfactual Example Generation Qianli Wang Nils Feldhus Simon Ostermann Luis Felipe Villa-Arenas Sebastian Möller Vera Schmitt AAML 34 0 0 01 Jan 2025
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation Dennis Fucci Marco Gaido Beatrice Savoldi Matteo Negri Mauro Cettolo L. Bentivogli 49 1 0 03 Nov 2024
Latent Concept-based Explanation of NLP Models Xuemin Yu Fahim Dalvi Nadir Durrani Marzia Nouri Hassan Sajjad LRM FAtt 19 1 0 18 Apr 2024
Credit Risk Meets Large Language Models: Building a Risk Indicator from Loan Descriptions in P2P Lending Mario Sanz-Guerrero Javier Arroyo 28 4 0 29 Jan 2024
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models Asma Ghandeharioun Avi Caciularu Adam Pearce Lucas Dixon Mor Geva 25 87 0 11 Jan 2024
Interpreting Pretrained Language Models via Concept Bottlenecks Zhen Tan Lu Cheng Song Wang Yuan Bo Jundong Li Huan Liu LRM 22 20 0 08 Nov 2023
Codebook Features: Sparse and Discrete Interpretability for Neural Networks Alex Tamkin Mohammad Taufeeque Noah D. Goodman 17 27 0 26 Oct 2023
InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations Nils Feldhus Qianli Wang Tatiana Anikina Sahil Chopra Cennet Oguz Sebastian Möller 19 9 0 09 Oct 2023
Computational modeling of semantic change Nina Tahmasebi Haim Dubossarsky 26 6 0 13 Apr 2023
Multi-resolution Interpretation and Diagnostics Tool for Natural Language Classifiers P. Jalali Nengfeng Zhou Yufei Yu AAML 17 0 0 06 Mar 2023
Understanding the Role of Human Intuition on Reliance in Human-AI Decision-Making with Explanations Valerie Chen Q. V. Liao Jennifer Wortman Vaughan Gagan Bansal 36 103 0 18 Jan 2023
Universal and Independent: Multilingual Probing Framework for Exhaustive Model Interpretation and Evaluation O. Serikov Vitaly Protasov E. Voloshina V. Knyazkova Tatiana Shavrina 11 3 0 24 Oct 2022
Explainable Causal Analysis of Mental Health on Social Media Data Chandni Saxena Muskan Garg G. Saxena CML 16 8 0 16 Oct 2022
Review of Natural Language Processing in Pharmacology D. Trajanov Vangel Trajkovski Makedonka Dimitrieva Jovana Dobreva Milos Jovanovik Matej Klemen Alevs vZagar Marko Robnik-vSikonja LM&MA 13 7 0 22 Aug 2022
ferret: a Framework for Benchmarking Explainers on Transformers Giuseppe Attanasio Eliana Pastor C. Bonaventura Debora Nozza 15 30 0 02 Aug 2022
Is Attention Interpretation? A Quantitative Assessment On Sets Jonathan Haab N. Deutschmann María Rodríguez Martínez 8 6 0 26 Jul 2022
Mediators: Conversational Agents Explaining NLP Model Behavior Nils Feldhus A. Ravichandran Sebastian Möller 25 16 0 13 Jun 2022
Interactive Model Cards: A Human-Centered Approach to Model Documentation Anamaria Crisan Margaret Drouhard Jesse Vig Nazneen Rajani HAI 15 86 0 05 May 2022
Interpretation of Black Box NLP Models: A Survey Shivani Choudhary N. Chatterjee S. K. Saha FAtt 28 10 0 31 Mar 2022
Measuring the Mixing of Contextual Information in the Transformer Javier Ferrando Gerard I. Gállego Marta R. Costa-jussá 21 48 0 08 Mar 2022
"Will You Find These Shortcuts?" A Protocol for Evaluating the Faithfulness of Input Salience Methods for Text Classification Jasmijn Bastings Sebastian Ebert Polina Zablotskaia Anders Sandholm Katja Filippova 107 75 0 14 Nov 2021
Explainable AI (XAI): A Systematic Meta-Survey of Current Challenges and Future Opportunities Waddah Saeed C. Omlin XAI 34 414 0 11 Nov 2021
Evaluating the Faithfulness of Importance Measures in NLP by Recursively Masking Allegedly Important Tokens and Retraining Andreas Madsen Nicholas Meade Vaibhav Adlakha Siva Reddy 96 35 0 15 Oct 2021
Probing Classifiers: Promises, Shortcomings, and Advances Yonatan Belinkov 221 402 0 24 Feb 2021
UnNatural Language Inference Koustuv Sinha Prasanna Parthasarathi Joelle Pineau Adina Williams 211 94 0 30 Dec 2020
It's Morphin' Time! Combating Linguistic Discrimination with Inflectional Perturbations Samson Tan Shafiq R. Joty Min-Yen Kan R. Socher 152 103 0 09 May 2020
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 226 4,424 0 23 Jan 2020
A Survey on Bias and Fairness in Machine Learning Ninareh Mehrabi Fred Morstatter N. Saxena Kristina Lerman Aram Galstyan SyDa FaML 294 4,187 0 23 Aug 2019
e-SNLI: Natural Language Inference with Natural Language Explanations Oana-Maria Camburu Tim Rocktaschel Thomas Lukasiewicz Phil Blunsom LRM 252 620 0 04 Dec 2018
What you can cram into a single vector: Probing sentence embeddings for linguistic properties Alexis Conneau Germán Kruszewski Guillaume Lample Loïc Barrault Marco Baroni 199 879 0 03 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 294 6,943 0 20 Apr 2018
A causal framework for explaining the predictions of black-box sequence-to-sequence models David Alvarez-Melis Tommi Jaakkola CML 219 201 0 06 Jul 2017
Towards A Rigorous Science of Interpretable Machine Learning Finale Doshi-Velez Been Kim XAI FaML 225 3,672 0 28 Feb 2017
Efficient Estimation of Word Representations in Vector Space Tomáš Mikolov Kai Chen G. Corrado J. Dean 3DV 228 31,150 0 16 Jan 2013