ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.04284
  4. Cited By
Analyzing the Structure of Attention in a Transformer Language Model
v1v2 (latest)

Analyzing the Structure of Attention in a Transformer Language Model

7 June 2019
Jesse Vig
Yonatan Belinkov
ArXiv (abs)PDFHTML

Papers citing "Analyzing the Structure of Attention in a Transformer Language Model"

50 / 225 papers shown
Title
VL-InterpreT: An Interactive Visualization Tool for Interpreting
  Vision-Language Transformers
VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language TransformersComputer Vision and Pattern Recognition (CVPR), 2022
Estelle Aflalo
Meng Du
Shao-Yen Tseng
Yongfei Liu
Chenfei Wu
Nan Duan
Vasudev Lal
185
55
0
30 Mar 2022
Measuring the Mixing of Contextual Information in the Transformer
Measuring the Mixing of Contextual Information in the TransformerConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Javier Ferrando
Gerard I. Gállego
Marta R. Costa-jussá
254
69
0
08 Mar 2022
What Do They Capture? -- A Structural Analysis of Pre-Trained Language
  Models for Source Code
What Do They Capture? -- A Structural Analysis of Pre-Trained Language Models for Source CodeInternational Conference on Software Engineering (ICSE), 2022
Yao Wan
Wei Zhao
Hongyu Zhang
Yulei Sui
Guandong Xu
Hairong Jin
269
121
0
14 Feb 2022
Investigating Explainability of Generative AI for Code through
  Scenario-based Design
Investigating Explainability of Generative AI for Code through Scenario-based DesignInternational Conference on Intelligent User Interfaces (IUI), 2022
Jiao Sun
Q. V. Liao
Michael J. Muller
Mayank Agarwal
Stephanie Houde
Kartik Talamadupula
Justin D. Weisz
160
203
0
10 Feb 2022
Human Guided Exploitation of Interpretable Attention Patterns in
  Summarization and Topic Segmentation
Human Guided Exploitation of Interpretable Attention Patterns in Summarization and Topic Segmentation
Raymond Li
Wen Xiao
Linzi Xing
Lanjun Wang
Gabriel Murray
Giuseppe Carenini
ViT
199
9
0
10 Dec 2021
Exploiting a Zoo of Checkpoints for Unseen Tasks
Exploiting a Zoo of Checkpoints for Unseen Tasks
Jiaji Huang
Qiang Qiu
Kenneth Church
158
4
0
05 Nov 2021
Interpreting Deep Learning Models in Natural Language Processing: A
  Review
Interpreting Deep Learning Models in Natural Language Processing: A Review
Xiaofei Sun
Diyi Yang
Xiaoya Li
Tianwei Zhang
Yuxian Meng
Han Qiu
Guoyin Wang
Eduard H. Hovy
Jiwei Li
183
52
0
20 Oct 2021
Improving Transformers with Probabilistic Attention Keys
Improving Transformers with Probabilistic Attention Keys
Tam Nguyen
T. Nguyen
Dung D. Le
Duy Khuong Nguyen
Viet-Anh Tran
Richard G. Baraniuk
Nhat Ho
Stanley J. Osher
192
35
0
16 Oct 2021
MEDUSA: Multi-scale Encoder-Decoder Self-Attention Deep Neural Network
  Architecture for Medical Image Analysis
MEDUSA: Multi-scale Encoder-Decoder Self-Attention Deep Neural Network Architecture for Medical Image AnalysisFrontiers in Medicine (Front. Med.), 2021
Hossein Aboutalebi
Maya Pavlova
Hayden Gunraj
M. Shafiee
A. Sabri
Amer Alaref
Alexander Wong
204
20
0
12 Oct 2021
How BPE Affects Memorization in Transformers
How BPE Affects Memorization in Transformers
Eugene Kharitonov
Marco Baroni
Dieuwke Hupkes
409
37
0
06 Oct 2021
GradTS: A Gradient-Based Automatic Auxiliary Task Selection Method Based
  on Transformer Networks
GradTS: A Gradient-Based Automatic Auxiliary Task Selection Method Based on Transformer Networks
Weicheng Ma
Renze Lou
Kai Zhang
Lili Wang
Soroush Vosoughi
138
8
0
13 Sep 2021
Attention-based Contrastive Learning for Winograd Schemas
Attention-based Contrastive Learning for Winograd Schemas
T. Klein
Moin Nabi
SSL
126
3
0
10 Sep 2021
Contributions of Transformer Attention Heads in Multi- and Cross-lingual
  Tasks
Contributions of Transformer Attention Heads in Multi- and Cross-lingual Tasks
Weicheng Ma
Kai Zhang
Renze Lou
Lili Wang
Soroush Vosoughi
730
19
0
18 Aug 2021
FMMformer: Efficient and Flexible Transformer via Decomposed Near-field
  and Far-field Attention
FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field AttentionNeural Information Processing Systems (NeurIPS), 2021
T. Nguyen
Vai Suliafu
Stanley J. Osher
Long Chen
Bao Wang
125
38
0
05 Aug 2021
Multi-Stream Transformers
Multi-Stream Transformers
Andrey Kravchenko
Anna Rumshisky
AI4CE
98
0
0
21 Jul 2021
Transformer-F: A Transformer network with effective methods for learning
  universal sentence representation
Transformer-F: A Transformer network with effective methods for learning universal sentence representation
Yu Shi
101
1
0
02 Jul 2021
Eigen Analysis of Self-Attention and its Reconstruction from Partial
  Computation
Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation
Srinadh Bhojanapalli
Ayan Chakrabarti
Himanshu Jain
Sanjiv Kumar
Michal Lukasik
Andreas Veit
105
10
0
16 Jun 2021
Thinking Like Transformers
Thinking Like TransformersInternational Conference on Machine Learning (ICML), 2021
Gail Weiss
Yoav Goldberg
Eran Yahav
AI4CE
285
166
0
13 Jun 2021
FedNLP: An interpretable NLP System to Decode Federal Reserve
  Communications
FedNLP: An interpretable NLP System to Decode Federal Reserve CommunicationsAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2021
Jean Lee
Hoyoul Luis Youn
Nicholas Stevens
Josiah Poon
S. Han
111
10
0
11 Jun 2021
On the Distribution, Sparsity, and Inference-time Quantization of
  Attention Values in Transformers
On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in TransformersFindings (Findings), 2021
Tianchu Ji
Shraddhan Jain
M. Ferdman
Peter Milder
H. Andrew Schwartz
Niranjan Balasubramanian
MQ
224
19
0
02 Jun 2021
Do Multilingual Neural Machine Translation Models Contain Language Pair
  Specific Attention Heads?
Do Multilingual Neural Machine Translation Models Contain Language Pair Specific Attention Heads?Findings (Findings), 2021
Min Namgung
Laurent Besacier
Vassilina Nikoulina
D. Schwab
MILM
128
9
0
31 May 2021
On the Interplay Between Fine-tuning and Composition in Transformers
On the Interplay Between Fine-tuning and Composition in TransformersFindings (Findings), 2021
Lang-Chi Yu
Allyson Ettinger
198
14
0
31 May 2021
Effective Attention Sheds Light On Interpretability
Effective Attention Sheds Light On InterpretabilityFindings (Findings), 2021
Kaiser Sun
Ana Marasović
MILM
129
17
0
18 May 2021
FNet: Mixing Tokens with Fourier Transforms
FNet: Mixing Tokens with Fourier TransformsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
James Lee-Thorp
Joshua Ainslie
Ilya Eckstein
Santiago Ontanon
504
617
0
09 May 2021
Accounting for Agreement Phenomena in Sentence Comprehension with
  Transformer Language Models: Effects of Similarity-based Interference on
  Surprisal and Attention
Accounting for Agreement Phenomena in Sentence Comprehension with Transformer Language Models: Effects of Similarity-based Interference on Surprisal and AttentionWorkshop on Cognitive Modeling and Computational Linguistics (CMCL), 2021
S. Ryu
Richard L. Lewis
163
33
0
26 Apr 2021
Morph Call: Probing Morphosyntactic Content of Multilingual Transformers
Morph Call: Probing Morphosyntactic Content of Multilingual Transformers
Vladislav Mikhailov
O. Serikov
Ekaterina Artemova
204
10
0
26 Apr 2021
Knowledge Neurons in Pretrained Transformers
Knowledge Neurons in Pretrained TransformersAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Damai Dai
Li Dong
Y. Hao
Zhifang Sui
Baobao Chang
Furu Wei
KELMMU
469
566
0
18 Apr 2021
Supervising Model Attention with Human Explanations for Robust Natural
  Language Inference
Supervising Model Attention with Human Explanations for Robust Natural Language InferenceAAAI Conference on Artificial Intelligence (AAAI), 2021
Joe Stacey
Yonatan Belinkov
Marek Rei
219
51
0
16 Apr 2021
Pose Recognition with Cascade Transformers
Pose Recognition with Cascade TransformersComputer Vision and Pattern Recognition (CVPR), 2021
Ke Li
Shijie Wang
Xiang Zhang
Yifan Xu
Weijian Xu
Zhuowen Tu
ViT
146
242
0
14 Apr 2021
Attention, please! A survey of Neural Attention Models in Deep Learning
Attention, please! A survey of Neural Attention Models in Deep LearningArtificial Intelligence Review (AIR), 2021
Alana de Santana Correia
Esther Luna Colombini
HAI
308
249
0
31 Mar 2021
Rethinking Spatial Dimensions of Vision Transformers
Rethinking Spatial Dimensions of Vision TransformersIEEE International Conference on Computer Vision (ICCV), 2021
Byeongho Heo
Sangdoo Yun
Dongyoon Han
Sanghyuk Chun
Junsuk Choe
Seong Joon Oh
ViT
1.2K
680
0
30 Mar 2021
LazyFormer: Self Attention with Lazy Update
LazyFormer: Self Attention with Lazy Update
Chengxuan Ying
Guolin Ke
Di He
Tie-Yan Liu
172
19
0
25 Feb 2021
To Understand Representation of Layer-aware Sequence Encoders as
  Multi-order-graph
To Understand Representation of Layer-aware Sequence Encoders as Multi-order-graph
Sufeng Duan
Hai Zhao
MILM
272
0
0
16 Jan 2021
On-the-Fly Attention Modulation for Neural Generation
On-the-Fly Attention Modulation for Neural GenerationFindings (Findings), 2021
Yue Dong
Chandra Bhagavatula
Ximing Lu
Jena D. Hwang
Antoine Bosselut
Jackie C.K. Cheung
Yejin Choi
273
15
0
02 Jan 2021
On Explaining Your Explanations of BERT: An Empirical Study with
  Sequence Classification
On Explaining Your Explanations of BERT: An Empirical Study with Sequence Classification
Zhengxuan Wu
Desmond C. Ong
157
27
0
01 Jan 2021
Transformer Feed-Forward Layers Are Key-Value Memories
Transformer Feed-Forward Layers Are Key-Value MemoriesConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Mor Geva
R. Schuster
Jonathan Berant
Omer Levy
KELM
579
1,112
0
29 Dec 2020
Understood in Translation, Transformers for Domain Understanding
Understood in Translation, Transformers for Domain Understanding
Dimitrios Christofidellis
Matteo Manica
L. Georgopoulos
Hans Vandierendonck
MedIm
101
1
0
18 Dec 2020
Mask-Align: Self-Supervised Neural Word Alignment
Mask-Align: Self-Supervised Neural Word AlignmentAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Chi Chen
Maosong Sun
Yang Liu
133
34
0
13 Dec 2020
Do We Really Need That Many Parameters In Transformer For Extractive
  Summarization? Discourse Can Help !
Do We Really Need That Many Parameters In Transformer For Extractive Summarization? Discourse Can Help !
Wen Xiao
Patrick Huber
Giuseppe Carenini
109
13
0
03 Dec 2020
Self-Explaining Structures Improve NLP Models
Self-Explaining Structures Improve NLP Models
Zijun Sun
Chun Fan
Qinghong Han
Xiaofei Sun
Yuxian Meng
Leilei Gan
Jiwei Li
MILMXAILRMFAtt
266
40
0
03 Dec 2020
Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural
  Networks
Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural NetworksInternational Conference on Language Resources and Evaluation (LREC), 2020
Ileana Rugina
Rumen Dangovski
L. Jing
Preslav Nakov
Marin Soljacic
248
0
0
20 Nov 2020
Focus on the present: a regularization method for the ASR source-target
  attention layer
Focus on the present: a regularization method for the ASR source-target attention layerIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Nanxin Chen
Piotr Żelasko
Jesús Villalba
Najim Dehak
172
3
0
02 Nov 2020
Influence Patterns for Explaining Information Flow in BERT
Influence Patterns for Explaining Information Flow in BERTNeural Information Processing Systems (NeurIPS), 2020
Kaiji Lu
Zifan Wang
Piotr (Peter) Mardziel
Anupam Datta
GNN
206
19
0
02 Nov 2020
Improving BERT Performance for Aspect-Based Sentiment Analysis
Improving BERT Performance for Aspect-Based Sentiment Analysis
Akbar Karimi
L. Rossi
Andrea Prati
195
63
0
22 Oct 2020
Document-Level Relation Extraction with Adaptive Thresholding and
  Localized Context Pooling
Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling
Wenxuan Zhou
Kevin Huang
Tengyu Ma
Jing Huang
347
330
0
21 Oct 2020
Pair the Dots: Jointly Examining Training History and Test Stimuli for
  Model Interpretability
Pair the Dots: Jointly Examining Training History and Test Stimuli for Model Interpretability
Yuxian Meng
Chun Fan
Zijun Sun
Eduard H. Hovy
Leilei Gan
Jiwei Li
FAtt
242
10
0
14 Oct 2020
Interpreting Attention Models with Human Visual Attention in Machine
  Reading Comprehension
Interpreting Attention Models with Human Visual Attention in Machine Reading Comprehension
Ekta Sood
Simon Tannert
Diego Frassinelli
Andreas Bulling
Ngoc Thang Vu
HAI
183
60
0
13 Oct 2020
Structured Self-Attention Weights Encode Semantics in Sentiment Analysis
Structured Self-Attention Weights Encode Semantics in Sentiment AnalysisBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2020
Zhengxuan Wu
Thanh-Son Nguyen
Desmond C. Ong
MILM
141
22
0
10 Oct 2020
Assessing Phrasal Representation and Composition in Transformers
Assessing Phrasal Representation and Composition in Transformers
Lang-Chi Yu
Allyson Ettinger
CoGe
229
70
0
08 Oct 2020
Linguistic Profiling of a Neural Language Model
Linguistic Profiling of a Neural Language ModelInternational Conference on Computational Linguistics (COLING), 2020
Alessio Miaschi
D. Brunato
F. Dell’Orletta
Giulia Venturi
229
49
0
05 Oct 2020
Previous
12345
Next