Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.04341
Cited By
What Does BERT Look At? An Analysis of BERT's Attention
11 June 2019
Kevin Clark
Urvashi Khandelwal
Omer Levy
Christopher D. Manning
MILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What Does BERT Look At? An Analysis of BERT's Attention"
50 / 885 papers shown
Title
Laughing Heads: Can Transformers Detect What Makes a Sentence Funny?
Maxime Peyrard
Beatriz Borges
Kristina Gligorić
Robert West
13
12
0
19 May 2021
Effective Attention Sheds Light On Interpretability
Kaiser Sun
Ana Marasović
MILM
19
15
0
18 May 2021
FNet: Mixing Tokens with Fourier Transforms
James Lee-Thorp
Joshua Ainslie
Ilya Eckstein
Santiago Ontanon
24
517
0
09 May 2021
Let's Play Mono-Poly: BERT Can Reveal Words' Polysemy Level and Partitionability into Senses
Aina Garí Soler
Marianna Apidianaki
MILM
209
68
0
29 Apr 2021
Accounting for Agreement Phenomena in Sentence Comprehension with Transformer Language Models: Effects of Similarity-based Interference on Surprisal and Attention
S. Ryu
Richard L. Lewis
31
25
0
26 Apr 2021
Improving BERT Pretraining with Syntactic Supervision
Georgios Tziafas
Konstantinos Kogkalidis
G. Wijnholds
M. Moortgat
33
3
0
21 Apr 2021
When FastText Pays Attention: Efficient Estimation of Word Representations using Constrained Positional Weighting
Vít Novotný
Michal Štefánik
E. F. Ayetiran
Petr Sojka
Radim Řehůřek
6
4
0
19 Apr 2021
Probing for Bridging Inference in Transformer Language Models
Onkar Pandit
Yufang Hou
47
14
0
19 Apr 2021
BigGreen at SemEval-2021 Task 1: Lexical Complexity Prediction with Assembly Models
A. Islam
Weicheng Ma
Soroush Vosoughi
11
4
0
19 Apr 2021
Knowledge Neurons in Pretrained Transformers
Damai Dai
Li Dong
Y. Hao
Zhifang Sui
Baobao Chang
Furu Wei
KELM
MU
14
417
0
18 Apr 2021
Linguistic Dependencies and Statistical Dependence
Jacob Louis Hoover
Alessandro Sordoni
Wenyu Du
Timothy J. O'Donnell
21
13
0
18 Apr 2021
"Average" Approximates "First Principal Component"? An Empirical Analysis on Representations from Neural Language Models
Zihan Wang
Chengyu Dong
Jingbo Shang
FAtt
34
4
0
18 Apr 2021
Condenser: a Pre-training Architecture for Dense Retrieval
Luyu Gao
Jamie Callan
AI4CE
25
253
0
16 Apr 2021
Supervising Model Attention with Human Explanations for Robust Natural Language Inference
Joe Stacey
Yonatan Belinkov
Marek Rei
30
45
0
16 Apr 2021
Probing Across Time: What Does RoBERTa Know and When?
Leo Z. Liu
Yizhong Wang
Jungo Kasai
Hannaneh Hajishirzi
Noah A. Smith
KELM
8
80
0
16 Apr 2021
Sparse Attention with Linear Units
Biao Zhang
Ivan Titov
Rico Sennrich
6
38
0
14 Apr 2021
On the Impact of Knowledge-based Linguistic Annotations in the Quality of Scientific Embeddings
Andrés García-Silva
R. Denaux
José Manuél Gómez-Pérez
31
3
0
13 Apr 2021
Understanding Transformers for Bot Detection in Twitter
Andrés García-Silva
Cristian Berrío
José Manuél Gómez-Pérez
20
4
0
13 Apr 2021
WHOSe Heritage: Classification of UNESCO World Heritage "Outstanding Universal Value" Documents with Soft Labels
Nan Bai
Renqian Luo
Pirouz Nourian
A. Roders
29
6
0
12 Apr 2021
Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoBERTa
Junqi Dai
Hang Yan
Tianxiang Sun
Pengfei Liu
Xipeng Qiu
14
160
0
11 Apr 2021
KI-BERT: Infusing Knowledge Context for Better Language and Domain Understanding
Keyur Faldu
A. Sheth
Prashant Kikani
Hemang Akabari
11
28
0
09 Apr 2021
Transformers: "The End of History" for NLP?
Anton Chernyavskiy
Dmitry Ilvovsky
Preslav Nakov
39
30
0
09 Apr 2021
Low-Complexity Probing via Finding Subnetworks
Steven Cao
Victor Sanh
Alexander M. Rush
11
51
0
08 Apr 2021
Attention Head Masking for Inference Time Content Selection in Abstractive Summarization
Shuyang Cao
Lu Wang
CVBM
27
11
0
06 Apr 2021
Efficient Attentions for Long Document Summarization
L. Huang
Shuyang Cao
Nikolaus Nova Parulian
Heng Ji
Lu Wang
54
272
0
05 Apr 2021
Compressing Visual-linguistic Model via Knowledge Distillation
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lijuan Wang
Yezhou Yang
Zicheng Liu
VLM
36
96
0
05 Apr 2021
Annotating Columns with Pre-trained Language Models
Yoshihiko Suhara
Jinfeng Li
Yuliang Li
Dan Zhang
cCaugatay Demiralp
Chen Chen
W. Tan
LMTD
10
83
0
05 Apr 2021
A New Approach to Overgenerating and Scoring Abstractive Summaries
Kaiqiang Song
Bingqing Wang
Z. Feng
Fei Liu
16
17
0
05 Apr 2021
Exploring the Role of BERT Token Representations to Explain Sentence Probing Results
Hosein Mohebbi
Ali Modarressi
Mohammad Taher Pilehvar
MILM
19
23
0
03 Apr 2021
Do RNN States Encode Abstract Phonological Processes?
Miikka Silfverberg
Francis M. Tyers
Garrett Nicolai
Mans Hulden
17
1
0
01 Apr 2021
Attention, please! A survey of Neural Attention Models in Deep Learning
Alana de Santana Correia
Esther Luna Colombini
HAI
23
175
0
31 Mar 2021
Synthesis of Compositional Animations from Textual Descriptions
Anindita Ghosh
N. Cheema
Cennet Oguz
Christian Theobalt
P. Slusallek
31
170
0
26 Mar 2021
Dodrio: Exploring Transformer Models with Interactive Visualization
Zijie J. Wang
Robert Turko
Duen Horng Chau
12
35
0
26 Mar 2021
Zero-shot Sequence Labeling for Transformer-based Sentence Classifiers
Kamil Bujel
H. Yannakoudakis
Marek Rei
VLM
8
8
0
26 Mar 2021
Paragraph-level Rationale Extraction through Regularization: A case study on European Court of Human Rights Cases
Ilias Chalkidis
Manos Fergadiotis
D. Tsarapatsanis
Nikolaos Aletras
Ion Androutsopoulos
Prodromos Malakasiotis
AILaw
13
107
0
24 Mar 2021
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures
Sushant Singh
A. Mahmood
AI4TS
60
92
0
23 Mar 2021
Repairing Pronouns in Translation with BERT-Based Post-Editing
Reid Pryzant
17
0
0
23 Mar 2021
Bridging the gap between supervised classification and unsupervised topic modelling for social-media assisted crisis management
Mikael Brunila
Rosie Zhao
Andrei Mircea
Sam Lumley
R. Sieber
26
0
0
22 Mar 2021
Local Interpretations for Explainable Natural Language Processing: A Survey
Siwen Luo
Hamish Ivison
S. Han
Josiah Poon
MILM
33
48
0
20 Mar 2021
GPT Understands, Too
Xiao Liu
Yanan Zheng
Zhengxiao Du
Ming Ding
Yujie Qian
Zhilin Yang
Jie Tang
VLM
45
1,144
0
18 Mar 2021
Symbolic integration by integrating learning models with different strengths and weaknesses
Hazumi Kubota
Y. Tokuoka
Takahiro G. Yamada
Akira Funahashi
AIMat
21
4
0
09 Mar 2021
Few-shot Learning for Slot Tagging with Attentive Relational Network
Cennet Oguz
Ngoc Thang Vu
25
10
0
03 Mar 2021
Transformers with Competitive Ensembles of Independent Mechanisms
Alex Lamb
Di He
Anirudh Goyal
Guolin Ke
Chien-Feng Liao
Mirco Ravanelli
Yoshua Bengio
MoE
23
23
0
27 Feb 2021
SparseBERT: Rethinking the Importance Analysis in Self-attention
Han Shi
Jiahui Gao
Xiaozhe Ren
Hang Xu
Xiaodan Liang
Zhenguo Li
James T. Kwok
23
54
0
25 Feb 2021
LazyFormer: Self Attention with Lazy Update
Chengxuan Ying
Guolin Ke
Di He
Tie-Yan Liu
17
15
0
25 Feb 2021
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
226
405
0
24 Feb 2021
Using Prior Knowledge to Guide BERT's Attention in Semantic Textual Matching Tasks
Tingyu Xia
Yue Wang
Yuan Tian
Yi-Ju Chang
22
51
0
22 Feb 2021
Analyzing Curriculum Learning for Sentiment Analysis along Task Difficulty, Pacing and Visualization Axes
Anvesh Rao Vijjini
Kaveri Anuranjana
R. Mamidi
35
2
0
19 Feb 2021
COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining
Yu Meng
Chenyan Xiong
Payal Bajaj
Saurabh Tiwary
Paul N. Bennett
Jiawei Han
Xia Song
125
202
0
16 Feb 2021
Have Attention Heads in BERT Learned Constituency Grammar?
Ziyang Luo
21
6
0
16 Feb 2021
Previous
1
2
3
...
12
13
14
...
16
17
18
Next