Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2108.13654
Cited By
Discretized Integrated Gradients for Explaining Language Models
31 August 2021
Soumya Sanyal
Xiang Ren
FAtt
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Discretized Integrated Gradients for Explaining Language Models"
32 / 32 papers shown
Title
Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations
Yiyou Sun
Y. Gai
Lijie Chen
Abhilasha Ravichander
Yejin Choi
D. Song
HILM
57
0
0
17 Apr 2025
Reasoning-Grounded Natural Language Explanations for Language Models
Vojtech Cahlik
Rodrigo Alves
Pavel Kordík
LRM
51
1
0
14 Mar 2025
Can Input Attributions Interpret the Inductive Reasoning Process Elicited in In-Context Learning?
Mengyu Ye
Tatsuki Kuribayashi
Goro Kobayashi
Jun Suzuki
LRM
92
0
0
20 Dec 2024
Uniform Discretized Integrated Gradients: An effective attribution based method for explaining large language models
Swarnava Sinha Roy
Ayan Kundu
FAtt
71
0
0
05 Dec 2024
One Mind, Many Tongues: A Deep Dive into Language-Agnostic Knowledge Neurons in Large Language Models
Pengfei Cao
Yuheng Chen
Zhuoran Jin
Yubo Chen
Kang-Jun Liu
Jun Zhao
KELM
70
0
0
26 Nov 2024
Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models
Sepehr Kamahi
Yadollah Yaghoobzadeh
42
0
0
21 Aug 2024
Hard to Explain: On the Computational Hardness of In-Distribution Model Interpretation
Guy Amir
Shahaf Bassan
Guy Katz
42
2
0
07 Aug 2024
"Sorry, Come Again?" Prompting -- Enhancing Comprehension and Diminishing Hallucination with [PAUSE]-injected Optimal Paraphrasing
Vipula Rawte
Islam Tonmoy
M. M. Zaman
Prachi Priya
Marcin Kardas
Alan Schelten
Ruan Silva
LRM
28
1
0
27 Mar 2024
PE: A Poincare Explanation Method for Fast Text Hierarchy Generation
Qian Chen
Dongyang Li
Xiaofeng He
Hongzhao Li
Hongyu Yi
16
0
0
25 Mar 2024
Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
Shahar Katz
Yonatan Belinkov
Mor Geva
Lior Wolf
57
10
1
20 Feb 2024
Identification of Knowledge Neurons in Protein Language Models
Divya Nori
Shivali Singireddy
M. T. Have
MILM
11
2
0
17 Dec 2023
CIDR: A Cooperative Integrated Dynamic Refining Method for Minimal Feature Removal Problem
Qian Chen
Tao Zhang
Dongyang Li
Xiaofeng He
26
0
0
13 Dec 2023
An Attribution Method for Siamese Encoders
Lucas Moller
Dmitry Nikolaev
Sebastian Padó
15
4
0
09 Oct 2023
Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features
Eliana Pastor
Alkis Koudounas
Giuseppe Attanasio
Dirk Hovy
Elena Baralis
11
4
0
14 Sep 2023
Explainability for Large Language Models: A Survey
Haiyan Zhao
Hanjie Chen
Fan Yang
Ninghao Liu
Huiqi Deng
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Mengnan Du
LRM
21
408
0
02 Sep 2023
Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons
Yuheng Chen
Pengfei Cao
Yubo Chen
Kang Liu
Jun Zhao
KELM
25
41
0
25 Aug 2023
Time Interpret: a Unified Model Interpretability Library for Time Series
Joseph Enguehard
FAtt
AI4TS
20
4
0
05 Jun 2023
Sequential Integrated Gradients: a simple but effective method for explaining language models
Joseph Enguehard
20
38
0
25 May 2023
Token-wise Decomposition of Autoregressive Language Model Hidden States for Analyzing Model Predictions
Byung-Doh Oh
William Schuler
24
2
0
17 May 2023
Inseq: An Interpretability Toolkit for Sequence Generation Models
Gabriele Sarti
Nils Feldhus
Ludwig Sickert
Oskar van der Wal
Malvina Nissim
Arianna Bisazza
30
64
0
27 Feb 2023
Comparing Baseline Shapley and Integrated Gradients for Local Explanation: Some Additional Insights
Tianshu Feng
Zhipu Zhou
Tarun Joshi
V. Nair
FAtt
20
4
0
12 Aug 2022
Generalizability Analysis of Graph-based Trajectory Predictor with Vectorized Representation
Juanwu Lu
Wei Zhan
M. Tomizuka
Yeping Hu
20
6
0
06 Aug 2022
ferret: a Framework for Benchmarking Explainers on Transformers
Giuseppe Attanasio
Eliana Pastor
C. Bonaventura
Debora Nozza
33
30
0
02 Aug 2022
FRAME: Evaluating Rationale-Label Consistency Metrics for Free-Text Rationales
Aaron Chan
Shaoliang Nie
Liang Tan
Xiaochang Peng
Hamed Firooz
Maziar Sanjabi
Xiang Ren
40
9
0
02 Jul 2022
SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features
Juri Opitz
Anette Frank
26
32
0
14 Jun 2022
ER-Test: Evaluating Explanation Regularization Methods for Language Models
Brihi Joshi
Aaron Chan
Ziyi Liu
Shaoliang Nie
Maziar Sanjabi
Hamed Firooz
Xiang Ren
AAML
30
6
0
25 May 2022
FaiRR: Faithful and Robust Deductive Reasoning over Natural Language
Soumya Sanyal
Harman Singh
Xiang Ren
ReLM
LRM
24
44
0
19 Mar 2022
UNIREX: A Unified Learning Framework for Language Model Rationale Extraction
Aaron Chan
Maziar Sanjabi
Lambert Mathias
L Tan
Shaoliang Nie
Xiaochang Peng
Xiang Ren
Hamed Firooz
38
41
0
16 Dec 2021
The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations
Peter Hase
Harry Xie
Mohit Bansal
OODD
LRM
FAtt
18
91
0
01 Jun 2021
Connecting Attributions and QA Model Behavior on Realistic Counterfactuals
Xi Ye
Rohan Nair
Greg Durrett
16
24
0
09 Apr 2021
Investigating Saturation Effects in Integrated Gradients
Vivek Miglani
Narine Kokhlikyan
B. Alsallakh
Miguel Martin
Orion Reblitz-Richardson
FAtt
16
23
0
23 Oct 2020
Towards A Rigorous Science of Interpretable Machine Learning
Finale Doshi-Velez
Been Kim
XAI
FaML
251
3,683
0
28 Feb 2017
1