Explaining How Transformers Use Context to Build Predictions

21 May 2023

Papers citing "Explaining How Transformers Use Context to Build Predictions"

28 / 28 papers shown

Title
A Close Look at Decomposition-based XAI-Methods for Transformer Language Models L. Arras Bruno Puri Patrick Kahardipraja Sebastian Lapuschkin Wojciech Samek 35 0 0 21 Feb 2025
LLMs as a synthesis between symbolic and continuous approaches to language Gemma Boleda SyDa 62 0 0 17 Feb 2025
Can Input Attributions Interpret the Inductive Reasoning Process Elicited in In-Context Learning? Mengyu Ye Tatsuki Kuribayashi Goro Kobayashi Jun Suzuki LRM 87 0 0 20 Dec 2024
On Explaining with Attention Matrices Omar Naim Nicholas Asher 19 0 0 24 Oct 2024
Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models Sepehr Kamahi Yadollah Yaghoobzadeh 30 0 0 21 Aug 2024
The Mechanics of Conceptual Interpretation in GPT Models: Interpretative Insights Nura Aljaafari Danilo S. Carvalho André Freitas KELM 16 0 0 05 Aug 2024
Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment Sangwon Yu Jongyoon Song Bongkyu Hwang Hoyoung Kang Sooah Cho Junhwa Choi Seongho Joe Taehee Lee Youngjune Gwon Sungroh Yoon 69 4 0 31 Jul 2024
A Large Encoder-Decoder Family of Foundation Models For Chemical Language Eduardo Soares Victor Shirasuna E. V. Brazil Renato F. G. Cerqueira Dmitry Zubarev Kristin Schmidt AI4CE 19 2 0 24 Jul 2024
Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation Jirui Qi Gabriele Sarti Raquel Fernández Arianna Bisazza RALM 29 1 0 19 Jun 2024
Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs Weixuan Wang Barry Haddow Wei Peng Alexandra Birch MILM 20 8 0 13 Jun 2024
On Large Language Models' Hallucination with Regard to Known Facts Che Jiang Biqing Qi Xiangyu Hong Dayuan Fu Yang Cheng Fandong Meng Mo Yu Bowen Zhou Jie Zhou HILM LRM 23 17 0 29 Mar 2024
Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions Clement Neo Shay B. Cohen Fazl Barez 31 4 0 23 Feb 2024
The Hidden Space of Transformer Language Adapters Jesujoba Oluwadara Alabi Marius Mosbach Matan Eyal Dietrich Klakow Mor Geva 35 7 1 20 Feb 2024
When Only Time Will Tell: Interpreting How Transformers Process Local Ambiguities Through the Lens of Restart-Incrementality Brielen Madureira Patrick Kahardipraja David Schlangen 23 2 0 20 Feb 2024
Explainable Identification of Hate Speech towards Islam using Graph Neural Networks Azmine Toushik Wasi 16 0 0 02 Nov 2023
Roles of Scaling and Instruction Tuning in Language Perception: Model vs. Human Attention Changjiang Gao Shujian Huang Jixing Li Jiajun Chen LRM ALM 24 6 0 29 Oct 2023
Why bother with geometry? On the relevance of linear decompositions of Transformer embeddings Timothee Mickus Raúl Vázquez 13 2 0 10 Oct 2023
Quantifying the Plausibility of Context Reliance in Neural Machine Translation Gabriele Sarti Grzegorz Chrupala Malvina Nissim Arianna Bisazza 22 5 0 02 Oct 2023
Neurons in Large Language Models: Dead, N-gram, Positional Elena Voita Javier Ferrando Christoforos Nalmpantis MILM 20 45 0 09 Sep 2023
Let the Models Respond: Interpreting Language Model Detoxification Through the Lens of Prompt Dependence Daniel Scalena Gabriele Sarti Malvina Nissim Elisabetta Fersini 9 0 0 01 Sep 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models Mor Geva Jasmijn Bastings Katja Filippova Amir Globerson KELM 186 260 0 28 Apr 2023
Inseq: An Interpretability Toolkit for Sequence Generation Models Gabriele Sarti Nils Feldhus Ludwig Sickert Oskar van der Wal Malvina Nissim Arianna Bisazza 17 64 0 27 Feb 2023
Analyzing Feed-Forward Blocks in Transformers through the Lens of Attention Maps Goro Kobayashi Tatsuki Kuribayashi Sho Yokoi Kentaro Inui 15 14 0 01 Feb 2023
Quantifying Context Mixing in Transformers Hosein Mohebbi Willem H. Zuidema Grzegorz Chrupała A. Alishahi 164 24 0 30 Jan 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 207 486 0 01 Nov 2022
In-context Learning and Induction Heads Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova Dassarma ... Tom B. Brown Jack Clark Jared Kaplan Sam McCandlish C. Olah 234 453 0 24 Sep 2022
Incorporating Residual and Normalization Layers into Analysis of Masked Language Models Goro Kobayashi Tatsuki Kuribayashi Sho Yokoi Kentaro Inui 153 45 0 15 Sep 2021
The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives Elena Voita Rico Sennrich Ivan Titov 179 181 0 03 Sep 2019