Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.12535
Cited By
Explaining How Transformers Use Context to Build Predictions
21 May 2023
Javier Ferrando
Gerard I. Gállego
Ioannis Tsiamas
Marta R. Costa-jussá
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Explaining How Transformers Use Context to Build Predictions"
28 / 28 papers shown
Title
A Close Look at Decomposition-based XAI-Methods for Transformer Language Models
L. Arras
Bruno Puri
Patrick Kahardipraja
Sebastian Lapuschkin
Wojciech Samek
35
0
0
21 Feb 2025
LLMs as a synthesis between symbolic and continuous approaches to language
Gemma Boleda
SyDa
62
0
0
17 Feb 2025
Can Input Attributions Interpret the Inductive Reasoning Process Elicited in In-Context Learning?
Mengyu Ye
Tatsuki Kuribayashi
Goro Kobayashi
Jun Suzuki
LRM
87
0
0
20 Dec 2024
On Explaining with Attention Matrices
Omar Naim
Nicholas Asher
19
0
0
24 Oct 2024
Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models
Sepehr Kamahi
Yadollah Yaghoobzadeh
30
0
0
21 Aug 2024
The Mechanics of Conceptual Interpretation in GPT Models: Interpretative Insights
Nura Aljaafari
Danilo S. Carvalho
André Freitas
KELM
16
0
0
05 Aug 2024
Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment
Sangwon Yu
Jongyoon Song
Bongkyu Hwang
Hoyoung Kang
Sooah Cho
Junhwa Choi
Seongho Joe
Taehee Lee
Youngjune Gwon
Sungroh Yoon
69
4
0
31 Jul 2024
A Large Encoder-Decoder Family of Foundation Models For Chemical Language
Eduardo Soares
Victor Shirasuna
E. V. Brazil
Renato F. G. Cerqueira
Dmitry Zubarev
Kristin Schmidt
AI4CE
19
2
0
24 Jul 2024
Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation
Jirui Qi
Gabriele Sarti
Raquel Fernández
Arianna Bisazza
RALM
29
1
0
19 Jun 2024
Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs
Weixuan Wang
Barry Haddow
Wei Peng
Alexandra Birch
MILM
20
8
0
13 Jun 2024
On Large Language Models' Hallucination with Regard to Known Facts
Che Jiang
Biqing Qi
Xiangyu Hong
Dayuan Fu
Yang Cheng
Fandong Meng
Mo Yu
Bowen Zhou
Jie Zhou
HILM
LRM
23
17
0
29 Mar 2024
Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions
Clement Neo
Shay B. Cohen
Fazl Barez
31
4
0
23 Feb 2024
The Hidden Space of Transformer Language Adapters
Jesujoba Oluwadara Alabi
Marius Mosbach
Matan Eyal
Dietrich Klakow
Mor Geva
35
7
1
20 Feb 2024
When Only Time Will Tell: Interpreting How Transformers Process Local Ambiguities Through the Lens of Restart-Incrementality
Brielen Madureira
Patrick Kahardipraja
David Schlangen
23
2
0
20 Feb 2024
Explainable Identification of Hate Speech towards Islam using Graph Neural Networks
Azmine Toushik Wasi
16
0
0
02 Nov 2023
Roles of Scaling and Instruction Tuning in Language Perception: Model vs. Human Attention
Changjiang Gao
Shujian Huang
Jixing Li
Jiajun Chen
LRM
ALM
24
6
0
29 Oct 2023
Why bother with geometry? On the relevance of linear decompositions of Transformer embeddings
Timothee Mickus
Raúl Vázquez
13
2
0
10 Oct 2023
Quantifying the Plausibility of Context Reliance in Neural Machine Translation
Gabriele Sarti
Grzegorz Chrupala
Malvina Nissim
Arianna Bisazza
22
5
0
02 Oct 2023
Neurons in Large Language Models: Dead, N-gram, Positional
Elena Voita
Javier Ferrando
Christoforos Nalmpantis
MILM
20
45
0
09 Sep 2023
Let the Models Respond: Interpreting Language Model Detoxification Through the Lens of Prompt Dependence
Daniel Scalena
Gabriele Sarti
Malvina Nissim
Elisabetta Fersini
9
0
0
01 Sep 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
186
260
0
28 Apr 2023
Inseq: An Interpretability Toolkit for Sequence Generation Models
Gabriele Sarti
Nils Feldhus
Ludwig Sickert
Oskar van der Wal
Malvina Nissim
Arianna Bisazza
17
64
0
27 Feb 2023
Analyzing Feed-Forward Blocks in Transformers through the Lens of Attention Maps
Goro Kobayashi
Tatsuki Kuribayashi
Sho Yokoi
Kentaro Inui
15
14
0
01 Feb 2023
Quantifying Context Mixing in Transformers
Hosein Mohebbi
Willem H. Zuidema
Grzegorz Chrupała
A. Alishahi
164
24
0
30 Jan 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
207
486
0
01 Nov 2022
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
234
453
0
24 Sep 2022
Incorporating Residual and Normalization Layers into Analysis of Masked Language Models
Goro Kobayashi
Tatsuki Kuribayashi
Sho Yokoi
Kentaro Inui
153
45
0
15 Sep 2021
The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives
Elena Voita
Rico Sennrich
Ivan Titov
179
181
0
03 Sep 2019
1