Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.04341
Cited By
What Does BERT Look At? An Analysis of BERT's Attention
11 June 2019
Kevin Clark
Urvashi Khandelwal
Omer Levy
Christopher D. Manning
MILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What Does BERT Look At? An Analysis of BERT's Attention"
50 / 883 papers shown
Title
Universal Dependencies according to BERT: both more specific and more general
Tomasz Limisiewicz
Rudolf Rosa
David Marevcek
4
18
0
30 Apr 2020
Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models
Isabel Papadimitriou
Dan Jurafsky
16
9
0
30 Apr 2020
Exploring Contextualized Neural Language Models for Temporal Dependency Parsing
Hayley L Ross
Jon Z. Cai
Bonan Min
6
19
0
30 Apr 2020
Asking without Telling: Exploring Latent Ontologies in Contextual Representations
Julian Michael
Jan A. Botha
Ian Tenney
23
42
0
29 Apr 2020
What Happens To BERT Embeddings During Fine-tuning?
Amil Merchant
Elahe Rahimtoroghi
Ellie Pavlick
Ian Tenney
9
176
0
29 Apr 2020
Towards Transparent and Explainable Attention Models
Akash Kumar Mohankumar
Preksha Nema
Sharan Narasimhan
Mitesh M. Khapra
Balaji Vasan Srinivasan
Balaraman Ravindran
31
99
0
29 Apr 2020
Quantifying the Contextualization of Word Representations with Semantic Class Probing
Mengjie Zhao
Philipp Dufter
Yadollah Yaghoobzadeh
Hinrich Schütze
12
27
0
25 Apr 2020
Lite Transformer with Long-Short Range Attention
Zhanghao Wu
Zhijian Liu
Ji Lin
Yujun Lin
Song Han
16
317
0
24 Apr 2020
Self-Attention Attribution: Interpreting Information Interactions Inside Transformer
Y. Hao
Li Dong
Furu Wei
Ke Xu
ViT
17
213
0
23 Apr 2020
Attention is Not Only a Weight: Analyzing Transformers with Vector Norms
Goro Kobayashi
Tatsuki Kuribayashi
Sho Yokoi
Kentaro Inui
16
15
0
21 Apr 2020
What's so special about BERT's layers? A closer look at the NLP pipeline in monolingual and multilingual models
Wietse de Vries
Andreas van Cranenburgh
Malvina Nissim
MILM
SSeg
MoE
11
64
0
14 Apr 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
28
70
0
10 Apr 2020
Telling BERT's full story: from Local Attention to Global Aggregation
Damian Pascual
Gino Brunner
Roger Wattenhofer
12
19
0
10 Apr 2020
MuTual: A Dataset for Multi-Turn Dialogue Reasoning
Leyang Cui
Yu-Huan Wu
Shujie Liu
Yue Zhang
Ming Zhou
LRM
10
150
0
09 Apr 2020
Multilingual Chart-based Constituency Parse Extraction from Pre-trained Language Models
Taeuk Kim
Bowen Li
Sang-goo Lee
49
6
0
08 Apr 2020
A Systematic Analysis of Morphological Content in BERT Models for Multiple Languages
Daniel Edmiston
20
32
0
06 Apr 2020
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Zhiqing Sun
Hongkun Yu
Xiaodan Song
Renjie Liu
Yiming Yang
Denny Zhou
MQ
8
795
0
06 Apr 2020
Unsupervised Domain Clusters in Pretrained Language Models
Roee Aharoni
Yoav Goldberg
24
243
0
05 Apr 2020
Deep Entity Matching with Pre-Trained Language Models
Yuliang Li
Jinfeng Li
Yoshihiko Suhara
A. Doan
W. Tan
VLM
17
371
0
01 Apr 2020
Information-Theoretic Probing with Minimum Description Length
Elena Voita
Ivan Titov
19
269
0
27 Mar 2020
Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles
Malte Ostendorff
Terry Ruas
M. Schubotz
Georg Rehm
Bela Gipp
16
26
0
22 Mar 2020
Calibration of Pre-trained Transformers
Shrey Desai
Greg Durrett
UQLM
243
289
0
17 Mar 2020
BERT as a Teacher: Contextual Embeddings for Sequence-Level Reward
Florian Schmidt
Thomas Hofmann
32
8
0
05 Mar 2020
A Primer in BERTology: What we know about how BERT works
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
30
1,455
0
27 Feb 2020
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Prakhar Ganesh
Yao Chen
Xin Lou
Mohammad Ali Khan
Y. Yang
Hassan Sajjad
Preslav Nakov
Deming Chen
Marianne Winslett
AI4CE
11
197
0
27 Feb 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
45
1,198
0
25 Feb 2020
What BERT Sees: Cross-Modal Transfer for Visual Question Generation
Thomas Scialom
Patrick Bordes
Paul-Alexis Dray
Jacopo Staiano
Patrick Gallinari
23
6
0
25 Feb 2020
Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation
Alessandro Raganato
Yves Scherrer
Jörg Tiedemann
22
92
0
24 Feb 2020
Investigating Typed Syntactic Dependencies for Targeted Sentiment Classification Using Graph Attention Neural Network
Xuefeng Bai
Pengbo Liu
Yue Zhang
GNN
28
3
0
22 Feb 2020
Federated pretraining and fine tuning of BERT using clinical notes from multiple silos
Dianbo Liu
Timothy A. Miller
AI4MH
22
34
0
20 Feb 2020
Molecule Attention Transformer
Lukasz Maziarka
Tomasz Danel
Slawomir Mucha
Krzysztof Rataj
Jacek Tabor
Stanislaw Jastrzebski
11
167
0
19 Feb 2020
Feature Importance Estimation with Self-Attention Networks
Blaž Škrlj
S. Džeroski
Nada Lavrac
Matej Petković
FAtt
MILM
26
51
0
11 Feb 2020
Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction
Taeuk Kim
Jihun Choi
Daniel Edmiston
Sang-goo Lee
22
90
0
30 Jan 2020
Asking Questions the Human Way: Scalable Question-Answer Generation from Text Corpus
Bang Liu
Haojie Wei
Di Niu
Haolan Chen
Yancheng He
17
92
0
27 Jan 2020
BERT's output layer recognizes all hidden layers? Some Intriguing Phenomena and a simple way to boost BERT
Wei-Tsung Kao
Tsung-Han Wu
Po-Han Chi
Chun-Cheng Hsieh
Hung-yi Lee
SSL
12
5
0
25 Jan 2020
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search
Daoyuan Chen
Yaliang Li
Minghui Qiu
Zhen Wang
Bofang Li
Bolin Ding
Hongbo Deng
Jun Huang
Wei Lin
Jingren Zhou
MQ
16
104
0
13 Jan 2020
RECAST: Interactive Auditing of Automatic Toxicity Detection Models
Austin P. Wright
Omar Shaikh
Haekyu Park
Will Epperson
Muhammed Ahmed
Stephane Pinel
Diyi Yang
Duen Horng Chau
KELM
23
6
0
07 Jan 2020
Towards Deep Federated Defenses Against Malware in Cloud Ecosystems
Josh Payne
A. Kundu
FedML
8
10
0
27 Dec 2019
Are Transformers universal approximators of sequence-to-sequence functions?
Chulhee Yun
Srinadh Bhojanapalli
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
6
334
0
20 Dec 2019
Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model
Wenhan Xiong
Jingfei Du
William Yang Wang
Veselin Stoyanov
SSL
KELM
36
201
0
20 Dec 2019
Cross-Lingual Ability of Multilingual BERT: An Empirical Study
Karthikeyan K
Zihan Wang
Stephen D. Mayhew
Dan Roth
LRM
25
334
0
17 Dec 2019
WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
J. Tian
A. Kreuzer
Pai-Hung Chen
Hans-Martin Will
VLM
34
3
0
13 Dec 2019
Do Attention Heads in BERT Track Syntactic Dependencies?
Phu Mon Htut
Jason Phang
Shikha Bordia
Samuel R. Bowman
19
136
0
27 Nov 2019
Emotional Neural Language Generation Grounded in Situational Contexts
Sashank Santhanam
Samira Shaikh
10
15
0
25 Nov 2019
What do you mean, BERT? Assessing BERT as a Distributional Semantics Model
Timothee Mickus
Denis Paperno
Mathieu Constant
Kees van Deemter
16
45
0
13 Nov 2019
Understanding Multi-Head Attention in Abstractive Summarization
Joris Baan
Maartje ter Hoeve
M. V. D. Wees
Anne Schuth
Maarten de Rijke
AAML
19
23
0
10 Nov 2019
Knowledge Guided Named Entity Recognition for BioMedical Text
Pratyay Banerjee
Kuntal Kumar Pal
M. Devarakonda
Chitta Baral
11
0
0
10 Nov 2019
Generalizing Natural Language Analysis through Span-relation Representations
Zhengbao Jiang
W. Xu
Jun Araki
Graham Neubig
22
60
0
10 Nov 2019
What Would Elsa Do? Freezing Layers During Transformer Fine-Tuning
Jaejun Lee
Raphael Tang
Jimmy J. Lin
24
121
0
08 Nov 2019
Why Do Masked Neural Language Models Still Need Common Sense Knowledge?
Sunjae Kwon
Cheongwoong Kang
Jiyeon Han
Jaesik Choi
11
16
0
08 Nov 2019
Previous
1
2
3
...
16
17
18
Next