Discretized Integrated Gradients for Explaining Language Models

31 August 2021

Xiang Ren

Papers citing "Discretized Integrated Gradients for Explaining Language Models"

32 / 32 papers shown

Title
Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations Yiyou Sun Y. Gai Lijie Chen Abhilasha Ravichander Yejin Choi D. Song HILM 57 0 0 17 Apr 2025
Reasoning-Grounded Natural Language Explanations for Language Models Vojtech Cahlik Rodrigo Alves Pavel Kordík LRM 51 1 0 14 Mar 2025
Can Input Attributions Interpret the Inductive Reasoning Process Elicited in In-Context Learning? Mengyu Ye Tatsuki Kuribayashi Goro Kobayashi Jun Suzuki LRM 92 0 0 20 Dec 2024
Uniform Discretized Integrated Gradients: An effective attribution based method for explaining large language models Swarnava Sinha Roy Ayan Kundu FAtt 71 0 0 05 Dec 2024
One Mind, Many Tongues: A Deep Dive into Language-Agnostic Knowledge Neurons in Large Language Models Pengfei Cao Yuheng Chen Zhuoran Jin Yubo Chen Kang-Jun Liu Jun Zhao KELM 70 0 0 26 Nov 2024
Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models Sepehr Kamahi Yadollah Yaghoobzadeh 42 0 0 21 Aug 2024
Hard to Explain: On the Computational Hardness of In-Distribution Model Interpretation Guy Amir Shahaf Bassan Guy Katz 42 2 0 07 Aug 2024
"Sorry, Come Again?" Prompting -- Enhancing Comprehension and Diminishing Hallucination with [PAUSE]-injected Optimal Paraphrasing Vipula Rawte Islam Tonmoy M. M. Zaman Prachi Priya Marcin Kardas Alan Schelten Ruan Silva LRM 28 1 0 27 Mar 2024
PE: A Poincare Explanation Method for Fast Text Hierarchy Generation Qian Chen Dongyang Li Xiaofeng He Hongzhao Li Hongyu Yi 16 0 0 25 Mar 2024
Backward Lens: Projecting Language Model Gradients into the Vocabulary Space Shahar Katz Yonatan Belinkov Mor Geva Lior Wolf 60 10 1 20 Feb 2024
Identification of Knowledge Neurons in Protein Language Models Divya Nori Shivali Singireddy M. T. Have MILM 13 2 0 17 Dec 2023
CIDR: A Cooperative Integrated Dynamic Refining Method for Minimal Feature Removal Problem Qian Chen Tao Zhang Dongyang Li Xiaofeng He 26 0 0 13 Dec 2023
An Attribution Method for Siamese Encoders Lucas Moller Dmitry Nikolaev Sebastian Padó 15 4 0 09 Oct 2023
Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features Eliana Pastor Alkis Koudounas Giuseppe Attanasio Dirk Hovy Elena Baralis 11 4 0 14 Sep 2023
Explainability for Large Language Models: A Survey Haiyan Zhao Hanjie Chen Fan Yang Ninghao Liu Huiqi Deng Hengyi Cai Shuaiqiang Wang Dawei Yin Mengnan Du LRM 23 408 0 02 Sep 2023
Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons Yuheng Chen Pengfei Cao Yubo Chen Kang Liu Jun Zhao KELM 25 41 0 25 Aug 2023
Time Interpret: a Unified Model Interpretability Library for Time Series Joseph Enguehard FAtt AI4TS 20 4 0 05 Jun 2023
Sequential Integrated Gradients: a simple but effective method for explaining language models Joseph Enguehard 22 38 0 25 May 2023
Token-wise Decomposition of Autoregressive Language Model Hidden States for Analyzing Model Predictions Byung-Doh Oh William Schuler 24 2 0 17 May 2023
Inseq: An Interpretability Toolkit for Sequence Generation Models Gabriele Sarti Nils Feldhus Ludwig Sickert Oskar van der Wal Malvina Nissim Arianna Bisazza 30 64 0 27 Feb 2023
Comparing Baseline Shapley and Integrated Gradients for Local Explanation: Some Additional Insights Tianshu Feng Zhipu Zhou Tarun Joshi V. Nair FAtt 20 4 0 12 Aug 2022
Generalizability Analysis of Graph-based Trajectory Predictor with Vectorized Representation Juanwu Lu Wei Zhan M. Tomizuka Yeping Hu 22 6 0 06 Aug 2022
ferret: a Framework for Benchmarking Explainers on Transformers Giuseppe Attanasio Eliana Pastor C. Bonaventura Debora Nozza 33 30 0 02 Aug 2022
FRAME: Evaluating Rationale-Label Consistency Metrics for Free-Text Rationales Aaron Chan Shaoliang Nie Liang Tan Xiaochang Peng Hamed Firooz Maziar Sanjabi Xiang Ren 40 9 0 02 Jul 2022
SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features Juri Opitz Anette Frank 26 32 0 14 Jun 2022
ER-Test: Evaluating Explanation Regularization Methods for Language Models Brihi Joshi Aaron Chan Ziyi Liu Shaoliang Nie Maziar Sanjabi Hamed Firooz Xiang Ren AAML 30 6 0 25 May 2022
FaiRR: Faithful and Robust Deductive Reasoning over Natural Language Soumya Sanyal Harman Singh Xiang Ren ReLM LRM 24 44 0 19 Mar 2022
UNIREX: A Unified Learning Framework for Language Model Rationale Extraction Aaron Chan Maziar Sanjabi Lambert Mathias L Tan Shaoliang Nie Xiaochang Peng Xiang Ren Hamed Firooz 38 41 0 16 Dec 2021
The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations Peter Hase Harry Xie Mohit Bansal OODD LRM FAtt 18 91 0 01 Jun 2021
Connecting Attributions and QA Model Behavior on Realistic Counterfactuals Xi Ye Rohan Nair Greg Durrett 16 24 0 09 Apr 2021
Investigating Saturation Effects in Integrated Gradients Vivek Miglani Narine Kokhlikyan B. Alsallakh Miguel Martin Orion Reblitz-Richardson FAtt 16 23 0 23 Oct 2020
Towards A Rigorous Science of Interpretable Machine Learning Finale Doshi-Velez Been Kim XAI FaML 251 3,683 0 28 Feb 2017