Contrastive Explanations for Model Interpretability

Contrastive Explanations for Model Interpretability

2 March 2021

Swabha Swayamdipta

Shauli Ravfogel

Yejin Choi

Papers citing "Contrastive Explanations for Model Interpretability"

15 / 15 papers shown

Title
Comparative Explanations: Explanation Guided Decision Making for Human-in-the-Loop Preference Selection Tanmay Chakraborty Christian Wirth Christin Seifert 26 0 0 01 Apr 2025
Contrastive Explanations That Anticipate Human Misconceptions Can Improve Human Decision-Making Skills Zana Buçinca S. Swaroop Amanda E. Paluch Finale Doshi-Velez Krzysztof Z. Gajos 48 2 0 05 Oct 2024
CELL your Model: Contrastive Explanations for Large Language Models Ronny Luss Erik Miehling Amit Dhurandhar 40 0 0 17 Jun 2024
Heterogeneous Contrastive Learning for Foundation Models and Beyond Lecheng Zheng Baoyu Jing Zihao Li Hanghang Tong Jingrui He VLM 24 19 0 30 Mar 2024
A Geometric Notion of Causal Probing Clément Guerner Anej Svete Tianyu Liu Alex Warstadt Ryan Cotterell LLMSV 34 12 0 27 Jul 2023
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations Yanda Chen Ruiqi Zhong Narutatsu Ri Chen Zhao He He Jacob Steinhardt Zhou Yu Kathleen McKeown LRM 24 47 0 17 Jul 2023
Surfacing Biases in Large Language Models using Contrastive Input Decoding G. Yona Or Honovich Itay Laish Roee Aharoni 24 11 0 12 May 2023
ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning O. Yu. Golovneva Moya Chen Spencer Poff Martin Corredor Luke Zettlemoyer Maryam Fazel-Zarandi Asli Celikyilmaz ReLM LRM 20 137 0 15 Dec 2022
CCPrefix: Counterfactual Contrastive Prefix-Tuning for Many-Class Classification Y. Li Canran Xu Guodong Long Tao Shen Chongyang Tao Jing Jiang 38 1 0 11 Nov 2022
A General Search-based Framework for Generating Textual Counterfactual Explanations Daniel Gilo Shaul Markovitch LRM 16 0 0 01 Nov 2022
Log-linear Guardedness and its Implications Shauli Ravfogel Yoav Goldberg Ryan Cotterell 23 2 0 18 Oct 2022
Interpretation of Black Box NLP Models: A Survey Shivani Choudhary N. Chatterjee S. K. Saha FAtt 28 10 0 31 Mar 2022
Interpreting Deep Learning Models in Natural Language Processing: A Review Xiaofei Sun Diyi Yang Xiaoya Li Tianwei Zhang Yuxian Meng Han Qiu Guoyin Wang Eduard H. Hovy Jiwei Li 15 44 0 20 Oct 2021
Let the CAT out of the bag: Contrastive Attributed explanations for Text Saneem A. Chemmengath A. Azad Ronny Luss Amit Dhurandhar FAtt 26 10 0 16 Sep 2021
Hypothesis Only Baselines in Natural Language Inference Adam Poliak Jason Naradowsky Aparajita Haldar Rachel Rudinger Benjamin Van Durme 187 576 0 02 May 2018