Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.04284
Cited By
Analyzing the Structure of Attention in a Transformer Language Model
7 June 2019
Jesse Vig
Yonatan Belinkov
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Analyzing the Structure of Attention in a Transformer Language Model"
45 / 45 papers shown
Title
Are formal and functional linguistic mechanisms dissociated in language models?
Michael Hanna
Sandro Pezzelle
Yonatan Belinkov
45
0
0
14 Mar 2025
Transformer Meets Twicing: Harnessing Unattended Residual Information
Laziz U. Abdullaev
Tan M. Nguyen
39
2
0
02 Mar 2025
Retrieval Backward Attention without Additional Training: Enhance Embeddings of Large Language Models via Repetition
Yifei Duan
Raphael Shang
Deng Liang
Yongqiang Cai
80
0
0
28 Feb 2025
Selective Prompt Anchoring for Code Generation
Yuan Tian
Tianyi Zhang
86
3
0
24 Feb 2025
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
MoE
AI4CE
54
1
0
13 Feb 2025
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models
Michael Toker
Ido Galil
Hadas Orgad
Rinon Gal
Yoad Tewel
Gal Chechik
Yonatan Belinkov
DiffM
54
2
0
12 Jan 2025
Dynamic Attention-Guided Context Decoding for Mitigating Context Faithfulness Hallucinations in Large Language Models
Yanwen Huang
Yong Zhang
Ning Cheng
Zhitao Li
Shaojun Wang
Jing Xiao
75
0
0
02 Jan 2025
ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models
N. Jha
Brandon Reagen
OffRL
AI4CE
28
0
0
12 Oct 2024
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
Assaf Ben-Kish
Itamar Zimerman
Shady Abu Hussein
Nadav Cohen
Amir Globerson
Lior Wolf
Raja Giryes
Mamba
67
13
0
20 Jun 2024
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Daubener
...
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
Vincent Fortuin
56
17
0
28 Feb 2024
On the Importance of Step-wise Embeddings for Heterogeneous Clinical Time-Series
Rita Kuznetsova
Alizée Pace
Manuel Burger
Hugo Yèche
Gunnar Rätsch
AI4TS
27
5
0
15 Nov 2023
Uncovering Intermediate Variables in Transformers using Circuit Probing
Michael A. Lepori
Thomas Serre
Ellie Pavlick
70
7
0
07 Nov 2023
Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models
Yifan Hou
Jiaoda Li
Yu Fei
Alessandro Stolfo
Wangchunshu Zhou
Guangtao Zeng
Antoine Bosselut
Mrinmaya Sachan
LRM
30
39
0
23 Oct 2023
Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning
Chong Li
Shaonan Wang
Yunhao Zhang
Jiajun Zhang
Chengqing Zong
25
4
0
16 Oct 2023
AI for the Generation and Testing of Ideas Towards an AI Supported Knowledge Development Environment
T. Selker
13
3
0
17 Jul 2023
Incorporating Distributions of Discourse Structure for Long Document Abstractive Summarization
Dongqi Pu
Yifa Wang
Vera Demberg
29
21
0
26 May 2023
End-to-End Simultaneous Speech Translation with Differentiable Segmentation
Shaolei Zhang
Yang Feng
13
17
0
25 May 2023
State Spaces Aren't Enough: Machine Translation Needs Attention
Ali Vardasbi
Telmo Pires
Robin M. Schmidt
Stephan Peitz
19
9
0
25 Apr 2023
LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model
Hao Fei
Shengqiong Wu
Jingye Li
Bobo Li
Fei Li
Libo Qin
Meishan Zhang
M. Zhang
Tat-Seng Chua
26
75
0
13 Apr 2023
Interpretability in Activation Space Analysis of Transformers: A Focused Survey
Soniya Vijayakumar
AI4CE
27
3
0
22 Jan 2023
Dissociating language and thought in large language models
Kyle Mahowald
Anna A. Ivanova
I. Blank
Nancy Kanwisher
J. Tenenbaum
Evelina Fedorenko
ELM
ReLM
23
209
0
16 Jan 2023
Skip-Attention: Improving Vision Transformers by Paying Less Attention
Shashanka Venkataramanan
Amir Ghodrati
Yuki M. Asano
Fatih Porikli
A. Habibian
ViT
13
25
0
05 Jan 2023
On the Blind Spots of Model-Based Evaluation Metrics for Text Generation
Tianxing He
Jingyu Zhang
Tianle Wang
Sachin Kumar
Kyunghyun Cho
James R. Glass
Yulia Tsvetkov
27
44
0
20 Dec 2022
Explanation on Pretraining Bias of Finetuned Vision Transformer
Bumjin Park
Jaesik Choi
ViT
29
1
0
18 Nov 2022
On the Explainability of Natural Language Processing Deep Models
Julia El Zini
M. Awad
25
82
0
13 Oct 2022
CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models for Programming Language Attend Code Structure
Nuo Chen
Qiushi Sun
Renyu Zhu
Xiang Li
Xuesong Lu
Ming Gao
36
10
0
07 Oct 2022
Beware the Rationalization Trap! When Language Model Explainability Diverges from our Mental Models of Language
R. Sevastjanova
Mennatallah El-Assady
LRM
27
9
0
14 Jul 2022
What Do Compressed Multilingual Machine Translation Models Forget?
Alireza Mohammadshahi
Vassilina Nikoulina
Alexandre Berard
Caroline Brun
James Henderson
Laurent Besacier
AI4CE
40
9
0
22 May 2022
Learning from Bootstrapping and Stepwise Reinforcement Reward: A Semi-Supervised Framework for Text Style Transfer
Zhengyuan Liu
Nancy F. Chen
16
1
0
19 May 2022
Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker Information
Chiyu Feng
Po-Chun Hsu
Hung-yi Lee
SSL
20
8
0
08 May 2022
Paying More Attention to Self-attention: Improving Pre-trained Language Models via Attention Guiding
Shanshan Wang
Zhumin Chen
Z. Ren
Huasheng Liang
Qiang Yan
Pengjie Ren
22
9
0
06 Apr 2022
VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers
Estelle Aflalo
Meng Du
Shao-Yen Tseng
Yongfei Liu
Chenfei Wu
Nan Duan
Vasudev Lal
20
45
0
30 Mar 2022
Measuring the Mixing of Contextual Information in the Transformer
Javier Ferrando
Gerard I. Gállego
Marta R. Costa-jussá
21
48
0
08 Mar 2022
What Do They Capture? -- A Structural Analysis of Pre-Trained Language Models for Source Code
Yao Wan
Wei-Ye Zhao
Hongyu Zhang
Yulei Sui
Guandong Xu
Hairong Jin
27
105
0
14 Feb 2022
Interpreting Deep Learning Models in Natural Language Processing: A Review
Xiaofei Sun
Diyi Yang
Xiaoya Li
Tianwei Zhang
Yuxian Meng
Han Qiu
Guoyin Wang
Eduard H. Hovy
Jiwei Li
17
44
0
20 Oct 2021
MEDUSA: Multi-scale Encoder-Decoder Self-Attention Deep Neural Network Architecture for Medical Image Analysis
Hossein Aboutalebi
Maya Pavlova
Hayden Gunraj
M. Shafiee
A. Sabri
Amer Alaref
Alexander Wong
15
17
0
12 Oct 2021
Thinking Like Transformers
Gail Weiss
Yoav Goldberg
Eran Yahav
AI4CE
10
126
0
13 Jun 2021
Supervising Model Attention with Human Explanations for Robust Natural Language Inference
Joe Stacey
Yonatan Belinkov
Marek Rei
23
45
0
16 Apr 2021
Pose Recognition with Cascade Transformers
Ke Li
Shijie Wang
Xiang Zhang
Yifan Xu
Weijian Xu
Z. Tu
ViT
32
208
0
14 Apr 2021
Rethinking Spatial Dimensions of Vision Transformers
Byeongho Heo
Sangdoo Yun
Dongyoon Han
Sanghyuk Chun
Junsuk Choe
Seong Joon Oh
ViT
332
564
0
30 Mar 2021
Transformer Feed-Forward Layers Are Key-Value Memories
Mor Geva
R. Schuster
Jonathan Berant
Omer Levy
KELM
11
740
0
29 Dec 2020
Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling
Wenxuan Zhou
Kevin Huang
Tengyu Ma
Jing Huang
11
273
0
21 Oct 2020
Towards Transparent and Explainable Attention Models
Akash Kumar Mohankumar
Preksha Nema
Sharan Narasimhan
Mitesh M. Khapra
Balaji Vasan Srinivasan
Balaraman Ravindran
29
99
0
29 Apr 2020
Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation
Alessandro Raganato
Yves Scherrer
Jörg Tiedemann
17
92
0
24 Feb 2020
SANVis: Visual Analytics for Understanding Self-Attention Networks
Cheonbok Park
Inyoup Na
Yongjang Jo
Sungbok Shin
J. Yoo
Bum Chul Kwon
Jian Zhao
Hyungjong Noh
Yeonsoo Lee
Jaegul Choo
HAI
19
38
0
13 Sep 2019
1