Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness?

7 April 2020

Papers citing "Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness?"

50 / 130 papers shown

Title
From Pixels to Perception: Interpretable Predictions via Instance-wise Grouped Feature Selection Moritz Vandenhirtz Julia E. Vogt 38 0 0 09 May 2025
Reasoning Models Don't Always Say What They Think Yanda Chen Joe Benton Ansh Radhakrishnan Jonathan Uesato Carson E. Denison ... Vlad Mikulik Samuel R. Bowman Jan Leike Jared Kaplan E. Perez ReLM LRM 67 12 1 08 May 2025
Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets W. Liu Zhongyu Niu Lang Gao Zhiying Deng Jun Wang H. Wang Ruixuan Li 134 1 0 04 May 2025
PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications Trisanth Srinivasan Santosh Patapati 34 0 0 03 May 2025
Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods Mahdi Dhaini Ege Erdogan Nils Feldhus Gjergji Kasneci 46 0 0 02 May 2025
A constraints-based approach to fully interpretable neural networks for detecting learner behaviors Juan D. Pinto Luc Paquette 43 0 0 10 Apr 2025
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation Bowen Baker Joost Huizinga Leo Gao Zehao Dou M. Guan Aleksander Mądry Wojciech Zaremba J. Pachocki David Farhi LRM 69 11 0 14 Mar 2025
A Unified Framework with Novel Metrics for Evaluating the Effectiveness of XAI Techniques in LLMs Melkamu Mersha Mesay Gemeda Yigezu Hassan Shakil Ali Al shami SangHyun Byun Jugal Kalita 62 0 0 06 Mar 2025
Beyond Translation: LLM-Based Data Generation for Multilingual Fact-Checking Yi-Ling Chung Aurora Cobo Pablo Serna SyDa HILM 60 0 0 24 Feb 2025
A Survey of Model Architectures in Information Retrieval Zhichao Xu Fengran Mo Zhiqi Huang Crystina Zhang Puxuan Yu Bei Wang Jimmy J. Lin Vivek Srikumar KELM 3DV 56 2 0 21 Feb 2025
A Study of the Plausibility of Attention between RNN Encoders in Natural Language Inference Duc Hau Nguyen Duc Hau Nguyen Pascale Sébillot 47 5 0 23 Jan 2025
Regularization, Semi-supervision, and Supervision for a Plausible Attention-Based Explanation Duc Hau Nguyen Cyrielle Mallart Guillaume Gravier Pascale Sébillot 60 0 0 22 Jan 2025
A Tale of Two Imperatives: Privacy and Explainability Supriya Manna Niladri Sett 94 0 0 30 Dec 2024
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation Dennis Fucci Marco Gaido Beatrice Savoldi Matteo Negri Mauro Cettolo L. Bentivogli 54 1 0 03 Nov 2024
Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination Jerry Huang Prasanna Parthasarathi Mehdi Rezagholizadeh Boxing Chen Sarath Chandar 50 0 0 22 Oct 2024
FLARE: Faithful Logic-Aided Reasoning and Exploration Erik Arakelyan Pasquale Minervini Pat Verga Patrick Lewis Isabelle Augenstein ReLM LRM 63 2 0 14 Oct 2024
F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI Xu Zheng Farhad Shirani Zhuomin Chen Chaohao Lin Wei Cheng Wenbo Guo Dongsheng Luo AAML 28 0 0 03 Oct 2024
COOL: Efficient and Reliable Chain-Oriented Objective Logic with Neural Networks Feedback Control for Program Synthesis Jipeng Han 36 0 0 02 Oct 2024
Explainable AI needs formal notions of explanation correctness Stefan Haufe Rick Wilming Benedict Clark Rustam Zhumagambetov Danny Panknin Ahcène Boubekki XAI 31 1 0 22 Sep 2024
Prompts Are Programs Too! Understanding How Developers Build Software Containing Prompts Jenny T Liang Melissa Lin Nikitha Rao Brad A. Myers 75 5 0 19 Sep 2024
Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models Sepehr Kamahi Yadollah Yaghoobzadeh 48 0 0 21 Aug 2024
Visual Agents as Fast and Slow Thinkers Guangyan Sun Mingyu Jin Zhenting Wang Cheng-Long Wang Siqi Ma Qifan Wang Ying Nian Wu Ying Nian Wu Dongfang Liu Dongfang Liu LLMAG LRM 77 13 0 16 Aug 2024
Data Debugging is NP-hard for Classifiers Trained with SGD Zizheng Guo Pengyu Chen Yanzhang Fu Xuelong Li 28 0 0 02 Aug 2024
Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach Adam Wojciechowski Mateusz Lango Ondrej Dusek FAtt 41 0 0 30 Jul 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs Nitay Calderon Roi Reichart 38 10 0 27 Jul 2024
Exploring the Plausibility of Hate and Counter Speech Detectors with Explainable AI Adrian Jaques Böck D. Slijepcevic Matthias Zeppelzauer 44 0 0 25 Jul 2024
Transformer Circuit Faithfulness Metrics are not Robust Joseph Miller Bilal Chughtai William Saunders 50 7 0 11 Jul 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models Daking Rai Yilun Zhou Shi Feng Abulhair Saparov Ziyu Yao 82 19 0 02 Jul 2024
Evaluating Human Alignment and Model Faithfulness of LLM Rationale Mohsen Fayyaz Fan Yin Jiao Sun Nanyun Peng 55 3 0 28 Jun 2024
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations? Letitia Parcalabescu Anette Frank MLLM CoGe VLM 82 3 0 29 Apr 2024
RankingSHAP -- Listwise Feature Attribution Explanations for Ranking Models Maria Heuss Maarten de Rijke Avishek Anand 98 1 0 24 Mar 2024
Best of Both Worlds: A Pliable and Generalizable Neuro-Symbolic Approach for Relation Classification Robert Vacareanu F. Alam M. Islam Haris Riaz Mihai Surdeanu NAI 27 2 0 05 Mar 2024
B-Cos Aligned Transformers Learn Human-Interpretable Features Manuel Tran Amal Lahiani Yashin Dicente Cid Melanie Boxberg Peter Lienemann C. Matek S. J. Wagner Fabian J. Theis Eldad Klaiman Tingying Peng MedIm ViT 13 2 0 16 Jan 2024
Evaluating Language Model Agency through Negotiations Tim R. Davidson V. Veselovsky Martin Josifoski Maxime Peyrard Antoine Bosselut Michal Kosinski Robert West LLMAG 31 22 0 09 Jan 2024
ALMANACS: A Simulatability Benchmark for Language Model Explainability Edmund Mills Shiye Su Stuart J. Russell Scott Emmons 48 7 0 20 Dec 2023
The Problem of Coherence in Natural Language Explanations of Recommendations Jakub Raczynski Mateusz Lango Jerzy Stefanowski 30 6 0 18 Dec 2023
A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia Giovanni Monea Maxime Peyrard Martin Josifoski Vishrav Chaudhary Jason Eisner Emre Kiciman Hamid Palangi Barun Patra Robert West KELM 51 12 0 04 Dec 2023
Improving Interpretation Faithfulness for Vision Transformers Lijie Hu Yixin Liu Ninghao Liu Mengdi Huai Lichao Sun Di Wang 27 5 0 29 Nov 2023
Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning Chong Li Shaonan Wang Yunhao Zhang Jiajun Zhang Chengqing Zong 27 4 0 16 Oct 2023
Evaluating Explanation Methods for Vision-and-Language Navigation Guanqi Chen Lei Yang Guanhua Chen Jia Pan XAI 23 0 0 10 Oct 2023
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers Anna Langedijk Hosein Mohebbi Gabriele Sarti Willem H. Zuidema Jaap Jumelet 21 10 0 05 Oct 2023
Situated Natural Language Explanations Zining Zhu Hao Jiang Jingfeng Yang Sreyashi Nag Chao Zhang Jie Huang Yifan Gao Frank Rudzicz Bing Yin LRM 38 1 0 27 Aug 2023
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior? Ari Holtzman Peter West Luke Zettlemoyer AI4CE 30 14 0 31 Jul 2023
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations Yanda Chen Ruiqi Zhong Narutatsu Ri Chen Zhao He He Jacob Steinhardt Zhou Yu Kathleen McKeown LRM 26 47 0 17 Jul 2023
DARE: Towards Robust Text Explanations in Biomedical and Healthcare Applications Adam Ivankay Mattia Rigotti P. Frossard OOD MedIm 21 1 0 05 Jul 2023
AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap Q. V. Liao J. Vaughan 36 158 0 02 Jun 2023
MaNtLE: Model-agnostic Natural Language Explainer Rakesh R Menon Kerem Zaman Shashank Srivastava FAtt LRM 16 2 0 22 May 2023
VISION DIFFMASK: Faithful Interpretation of Vision Transformers with Differentiable Patch Masking A. Nalmpantis Apostolos Panagiotopoulos John Gkountouras Konstantinos Papakostas Wilker Aziz 15 4 0 13 Apr 2023
Faithful Chain-of-Thought Reasoning Qing Lyu Shreya Havaldar Adam Stein Li Zhang D. Rao Eric Wong Marianna Apidianaki Chris Callison-Burch ReLM LRM 26 207 0 31 Jan 2023
The State of Human-centered NLP Technology for Fact-checking Anubrata Das Houjiang Liu Venelin Kovatchev Matthew Lease HILM 19 61 0 08 Jan 2023