Are self-explanations from Large Language Models faithful?

Are self-explanations from Large Language Models faithful?

15 January 2024

Siva Reddy

Papers citing "Are self-explanations from Large Language Models faithful?"

10 / 10 papers shown

Title
SEER: Self-Explainability Enhancement of Large Language Models' Representations Guanxu Chen Dongrui Liu Tao Luo Jing Shao LRM MILM 59 1 0 07 Feb 2025
Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination Jerry Huang Prasanna Parthasarathi Mehdi Rezagholizadeh Boxing Chen Sarath Chandar 48 0 0 22 Oct 2024
Evaluating the Reliability of Self-Explanations in Large Language Models Korbinian Randl John Pavlopoulos Aron Henriksson Tony Lindgren LRM 33 0 0 19 Jul 2024
Evaluating Human Alignment and Model Faithfulness of LLM Rationale Mohsen Fayyaz Fan Yin Jiao Sun Nanyun Peng 37 3 0 28 Jun 2024
CELL your Model: Contrastive Explanations for Large Language Models Ronny Luss Erik Miehling Amit Dhurandhar 26 0 0 17 Jun 2024
SCOTT: Self-Consistent Chain-of-Thought Distillation Jamie Yap Zhengyang Wang Zheng Li K. Lynch Bing Yin Xiang Ren LRM 57 91 0 03 May 2023
"Will You Find These Shortcuts?" A Protocol for Evaluating the Faithfulness of Input Salience Methods for Text Classification Jasmijn Bastings Sebastian Ebert Polina Zablotskaia Anders Sandholm Katja Filippova 102 75 0 14 Nov 2021
Evaluating the Faithfulness of Importance Measures in NLP by Recursively Masking Allegedly Important Tokens and Retraining Andreas Madsen Nicholas Meade Vaibhav Adlakha Siva Reddy 96 35 0 15 Oct 2021
e-SNLI: Natural Language Inference with Natural Language Explanations Oana-Maria Camburu Tim Rocktaschel Thomas Lukasiewicz Phil Blunsom LRM 249 618 0 04 Dec 2018
Towards A Rigorous Science of Interpretable Machine Learning Finale Doshi-Velez Been Kim XAI FaML 219 3,658 0 28 Feb 2017