A Necessary Step toward Faithfulness: Measuring and Improving Consistency in Free-Text Explanations

25 May 2025

Papers citing "A Necessary Step toward Faithfulness: Measuring and Improving Consistency in Free-Text Explanations"

39 / 39 papers shown

Title
Fool Me Once? Contrasting Textual and Visual Explanations in a Clinical Decision-Support Setting Maxime Kayser Bayar I. Menzat Cornelius Emde Bogdan Bercean Alex Novak Abdala Espinosa B. Papież Susanne Gaube Thomas Lukasiewicz Oana-Maria Camburu 86 4 0 16 Oct 2024
Yi: Open Foundation Models by 01.AI 01. AI Alex Young 01.AI Alex Young Bei Chen Chao Li ... Yue Wang Yuxuan Cai Zhenyu Gu Zhiyuan Liu Zonghong Dai OSLM LRM 200 525 0 07 Mar 2024
Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning Yanda Chen Chandan Singh Xiaodong Liu Simiao Zuo Bin Yu He He Jianfeng Gao LRM 46 14 0 25 Jan 2024
Is Ignorance Bliss? The Role of Post Hoc Explanation Faithfulness and Alignment in Model Trust in Laypeople and Domain Experts Tessa Han Yasha Ektefaie Maha Farhat Marinka Zitnik Himabindu Lakkaraju FAtt 34 3 0 09 Dec 2023
Explaining with Contrastive Phrasal Highlighting: A Case Study in Assisting Humans to Detect Translation Differences Eleftheria Briakou Navita Goyal Marine Carpuat 69 2 0 04 Dec 2023
Tailoring Self-Rationalizers with Multi-Reward Distillation Sahana Ramnath Brihi Joshi Skyler Hallinan Ximing Lu Liunian Harold Li Aaron Chan Jack Hessel Yejin Choi Xiang Ren LRM ReLM 33 16 0 06 Nov 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models Hugo Touvron Louis Martin Kevin R. Stone Peter Albert Amjad Almahairi ... Sharan Narang Aurelien Rodriguez Robert Stojnic Sergey Edunov Thomas Scialom AI4MH ALM 197 11,484 0 18 Jul 2023
Measuring Faithfulness in Chain-of-Thought Reasoning Tamera Lanham Anna Chen Ansh Radhakrishnan Benoit Steiner Carson E. Denison ... Zac Hatfield-Dodds Jared Kaplan J. Brauner Sam Bowman Ethan Perez ReLM LRM 39 178 0 17 Jul 2023
FLamE: Few-shot Learning from Natural Language Explanations Yangqiaoyu Zhou Yiming Zhang Chenhao Tan LRM FAtt 64 10 0 13 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model Rafael Rafailov Archit Sharma E. Mitchell Stefano Ermon Christopher D. Manning Chelsea Finn ALM 268 3,712 0 29 May 2023
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting Miles Turpin Julian Michael Ethan Perez Sam Bowman ReLM LRM 61 414 0 07 May 2023
Towards Faithful Model Explanation in NLP: A Survey Qing Lyu Marianna Apidianaki Chris Callison-Burch XAI 124 115 0 22 Sep 2022
Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations Jaehun Jung Lianhui Qin Sean Welleck Faeze Brahman Chandra Bhagavatula Ronan Le Bras Yejin Choi ReLM LRM 237 195 0 24 May 2022
The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning Xi Ye Greg Durrett ReLM LRM 53 172 0 06 May 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models Xuezhi Wang Jason W. Wei Dale Schuurmans Quoc Le Ed H. Chi Sharan Narang Aakanksha Chowdhery Denny Zhou ReLM BDL LRM AI4CE 460 3,486 0 21 Mar 2022
Reframing Human-AI Collaboration for Generating Free-Text Explanations Sarah Wiegreffe Jack Hessel Swabha Swayamdipta Mark O. Riedl Yejin Choi 44 148 0 16 Dec 2021
Few-Shot Self-Rationalization with Natural Language Prompts Ana Marasović Iz Beltagy Doug Downey Matthew E. Peters LRM 51 108 0 16 Nov 2021
LoRA: Low-Rank Adaptation of Large Language Models J. E. Hu Yelong Shen Phillip Wallis Zeyuan Allen-Zhu Yuanzhi Li Shean Wang Lu Wang Weizhu Chen OffRL AI4TS AI4CE ALM AIMat 223 9,946 0 17 Jun 2021
From Human Explanation to Model Interpretability: A Framework Based on Weight of Evidence David Alvarez-Melis Harmanpreet Kaur Hal Daumé Hanna M. Wallach Jennifer Wortman Vaughan FAtt 73 30 0 27 Apr 2021
Evaluating Explanations: How much do explanations from the teacher aid students? Danish Pruthi Rachit Bansal Bhuwan Dhingra Livio Baldini Soares Michael Collins Zachary Chase Lipton Graham Neubig William W. Cohen FAtt XAI 40 109 0 01 Dec 2020
NILE : Natural Language Inference with Faithful Natural Language Explanations Sawan Kumar Partha P. Talukdar XAI LRM 64 163 0 25 May 2020
Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? Peter Hase Joey Tianyi Zhou FAtt 62 301 0 04 May 2020
WT5?! Training Text-to-Text Models to Explain their Predictions Sharan Narang Colin Raffel Katherine Lee Adam Roberts Noah Fiedel Karishma Malkan 41 199 0 30 Apr 2020
Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness? Alon Jacovi Yoav Goldberg XAI 66 584 0 07 Apr 2020
Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations Oana-Maria Camburu Brendan Shillingford Pasquale Minervini Thomas Lukasiewicz Phil Blunsom AAML GAN 36 97 0 07 Oct 2019
Explain Yourself! Leveraging Language Models for Commonsense Reasoning Nazneen Rajani Bryan McCann Caiming Xiong R. Socher ReLM LRM 54 561 0 06 Jun 2019
e-SNLI: Natural Language Inference with Natural Language Explanations Oana-Maria Camburu Tim Rocktaschel Thomas Lukasiewicz Phil Blunsom LRM 375 634 0 04 Dec 2018
Interpreting Neural Networks With Nearest Neighbors Eric Wallace Shi Feng Jordan L. Boyd-Graber AAML FAtt MILM 91 54 0 08 Sep 2018
Faithful Multimodal Explanation for Visual Question Answering Jialin Wu Raymond J. Mooney 32 91 0 08 Sep 2018
On the Robustness of Interpretability Methods David Alvarez-Melis Tommi Jaakkola 50 524 0 21 Jun 2018
Learning to Explain: An Information-Theoretic Perspective on Model Interpretation Jianbo Chen Le Song Martin J. Wainwright Michael I. Jordan MLT FAtt 118 568 0 21 Feb 2018
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence Dong Huk Park Lisa Anne Hendricks Zeynep Akata Anna Rohrbach Bernt Schiele Trevor Darrell Marcus Rohrbach 61 421 0 15 Feb 2018
Explanation in Artificial Intelligence: Insights from the Social Sciences Tim Miller XAI 217 4,229 0 22 Jun 2017
A Unified Approach to Interpreting Model Predictions Scott M. Lundberg Su-In Lee FAtt 446 21,459 0 22 May 2017
Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems Wang Ling Dani Yogatama Chris Dyer Phil Blunsom AIMat 49 701 0 11 May 2017
Towards A Rigorous Science of Interpretable Machine Learning Finale Doshi-Velez Been Kim XAI FaML 343 3,742 0 28 Feb 2017
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization Ramprasaath R. Selvaraju Michael Cogswell Abhishek Das Ramakrishna Vedantam Devi Parikh Dhruv Batra FAtt 205 19,796 0 07 Oct 2016
Evaluating the visualization of what a Deep Neural Network has learned Wojciech Samek Alexander Binder G. Montavon Sebastian Lapuschkin K. Müller XAI 99 1,189 0 21 Sep 2015
Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott Yejin Choi Claire Cardie Jeffrey T. Hancock 54 1,388 0 22 Jul 2011