Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals

1 October 2023

Papers citing "Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals"

32 / 32 papers shown

Title
Can LLMs Explain Themselves Counterfactually? Zahra Dehghanighobadi Asja Fischer Muhammad Bilal Zafar LRM 38 0 0 25 Feb 2025
Interpreting Language Reward Models via Contrastive Explanations Junqi Jiang Tom Bewley Saumitra Mishra Freddy Lecue Manuela Veloso 74 0 0 25 Nov 2024
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance Omer Nahum Nitay Calderon Orgad Keller Idan Szpektor Roi Reichart 23 2 0 24 Oct 2024
Causality for Large Language Models Anpeng Wu Kun Kuang Minqin Zhu Yingrong Wang Yujia Zheng Kairong Han B. Li Guangyi Chen Fei Wu Kun Zhang LRM 46 7 0 20 Oct 2024
TAGExplainer: Narrating Graph Explanations for Text-Attributed Graph Learning Models Bo Pan Zhen Xiong Guanchen Wu Zheng Zhang Yifei Zhang Liang Zhao FAtt 36 1 0 20 Oct 2024
Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models Wei Jie Yeo Ranjan Satapathy Erik Cambria 25 0 0 18 Oct 2024
NL-Eye: Abductive NLI for Images Mor Ventura Michael Toker Nitay Calderon Zorik Gekhman Yonatan Bitton Roi Reichart 28 1 0 03 Oct 2024
Counterfactual Token Generation in Large Language Models Ivi Chatzi N. C. Benz Eleni Straitouri Stratis Tsirtsis Manuel Gomez Rodriguez LRM 34 3 0 25 Sep 2024
Causal Inference with Large Language Model: A Survey Jing Ma CML LRM 91 8 0 15 Sep 2024
Enhancing adversarial robustness in Natural Language Inference using explanations Alexandros Koulakos Maria Lymperaiou Giorgos Filandrianos Giorgos Stamou SILM AAML 35 0 0 11 Sep 2024
Using LLMs for Explaining Sets of Counterfactual Examples to Final Users Arturo Fredes Jordi Vitria CML LRM 28 3 0 27 Aug 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs Nitay Calderon Roi Reichart 36 10 0 27 Jul 2024
A Survey on Natural Language Counterfactual Generation Yongjie Wang Xiaoqi Qiu Yu Yue Xu Guo Zhiwei Zeng Yuhong Feng Zhiqi Shen 34 5 0 04 Jul 2024
Distributional reasoning in LLMs: Parallel reasoning processes in multi-hop reasoning Yuval Shalev Amir Feder Ariel Goldstein LRM 39 4 0 19 Jun 2024
Large Language Models for Constrained-Based Causal Discovery Kai-Hendrik Cohrs Gherardo Varando Emiliano Díaz Vasileios Sitokonstantinou Gustau Camps-Valls 41 9 0 11 Jun 2024
Beyond Agreement: Diagnosing the Rationale Alignment of Automated Essay Scoring Methods based on Linguistically-informed Counterfactuals Yupei Wang Renfen Hu Zhe Zhao 32 2 0 29 May 2024
Large Language Models and Causal Inference in Collaboration: A Survey Xiaoyu Liu Paiheng Xu Junda Wu Jiaxin Yuan Yifan Yang ... Haoliang Wang Tong Yu Julian McAuley Wei Ai Furong Huang ELM LRM 77 5 0 14 Mar 2024
Large Scale Foundation Models for Intelligent Manufacturing Applications: A Survey Haotian Zhang S. D. Semujju Zhicheng Wang Xianwei Lv Kang Xu ... Jing Wu Zhuo Long Wensheng Liang Xiaoguang Ma Ruiyan Zhuang UQCV AI4TS AI4CE 27 4 0 11 Dec 2023
On Measuring Faithfulness or Self-consistency of Natural Language Explanations Letitia Parcalabescu Anette Frank LRM 69 20 0 13 Nov 2023
T-COL: Generating Counterfactual Explanations for General User Preferences on Variable Machine Learning Systems Yiming Li Daling Wang Wenfang Wu Shi Feng Yifei Zhang CML 40 1 0 28 Sep 2023
CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration Rachneet Sachdeva Martin Tutek Iryna Gurevych OODD 22 10 0 14 Sep 2023
Measuring the Robustness of NLP Models to Domain Shifts Nitay Calderon Naveh Porat Eyal Ben-David Alexander Chapanin Zorik Gekhman Nadav Oved Vitaly Shalumov Roi Reichart 16 6 0 31 May 2023
A Systematic Study of Knowledge Distillation for Natural Language Generation with Pseudo-Target Training Nitay Calderon Subhabrata Mukherjee Roi Reichart Amir Kantor 31 17 0 03 May 2023
Causal Proxy Models for Concept-Based Model Explanations Zhengxuan Wu Karel DÓosterlinck Atticus Geiger Amir Zur Christopher Potts MILM 75 35 0 28 Sep 2022
Towards Faithful Model Explanation in NLP: A Survey Qing Lyu Marianna Apidianaki Chris Callison-Burch XAI 106 107 0 22 Sep 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 311 11,915 0 04 Mar 2022
Framework for Evaluating Faithfulness of Local Explanations S. Dasgupta Nave Frost Michal Moshkovitz FAtt 111 61 0 01 Feb 2022
Rethinking Attention-Model Explainability through Faithfulness Violation Test Y. Liu Haoliang Li Yangyang Guo Chen Kong Jing Li Shiqi Wang FAtt 116 42 0 28 Jan 2022
Tailor: Generating and Perturbing Text with Semantic Controls Alexis Ross Tongshuang Wu Hao Peng Matthew E. Peters Matt Gardner 136 77 0 15 Jul 2021
A Survey on Stance Detection for Mis- and Disinformation Identification Momchil Hardalov Arnav Arora Preslav Nakov Isabelle Augenstein 109 132 0 27 Feb 2021
Measuring Association Between Labels and Free-Text Rationales Sarah Wiegreffe Ana Marasović Noah A. Smith 274 170 0 24 Oct 2020
What you can cram into a single vector: Probing sentence embeddings for linguistic properties Alexis Conneau Germán Kruszewski Guillaume Lample Loïc Barrault Marco Baroni 199 882 0 03 May 2018