What Does BERT Look At? An Analysis of BERT's Attention

11 June 2019

Kevin Clark

Urvashi Khandelwal

Omer Levy

Christopher D. Manning

MILM

ArXiv PDF HTML

Papers citing "What Does BERT Look At? An Analysis of BERT's Attention"

50 / 885 papers shown

Title
Laughing Heads: Can Transformers Detect What Makes a Sentence Funny? Maxime Peyrard Beatriz Borges Kristina Gligorić Robert West 13 12 0 19 May 2021
Effective Attention Sheds Light On Interpretability Kaiser Sun Ana Marasović MILM 19 15 0 18 May 2021
FNet: Mixing Tokens with Fourier Transforms James Lee-Thorp Joshua Ainslie Ilya Eckstein Santiago Ontanon 24 517 0 09 May 2021
Let's Play Mono-Poly: BERT Can Reveal Words' Polysemy Level and Partitionability into Senses Aina Garí Soler Marianna Apidianaki MILM 209 68 0 29 Apr 2021
Accounting for Agreement Phenomena in Sentence Comprehension with Transformer Language Models: Effects of Similarity-based Interference on Surprisal and Attention S. Ryu Richard L. Lewis 31 25 0 26 Apr 2021
Improving BERT Pretraining with Syntactic Supervision Georgios Tziafas Konstantinos Kogkalidis G. Wijnholds M. Moortgat 33 3 0 21 Apr 2021
When FastText Pays Attention: Efficient Estimation of Word Representations using Constrained Positional Weighting Vít Novotný Michal Štefánik E. F. Ayetiran Petr Sojka Radim Řehůřek 6 4 0 19 Apr 2021
Probing for Bridging Inference in Transformer Language Models Onkar Pandit Yufang Hou 47 14 0 19 Apr 2021
BigGreen at SemEval-2021 Task 1: Lexical Complexity Prediction with Assembly Models A. Islam Weicheng Ma Soroush Vosoughi 11 4 0 19 Apr 2021
Knowledge Neurons in Pretrained Transformers Damai Dai Li Dong Y. Hao Zhifang Sui Baobao Chang Furu Wei KELM MU 14 417 0 18 Apr 2021
Linguistic Dependencies and Statistical Dependence Jacob Louis Hoover Alessandro Sordoni Wenyu Du Timothy J. O'Donnell 21 13 0 18 Apr 2021
"Average" Approximates "First Principal Component"? An Empirical Analysis on Representations from Neural Language Models Zihan Wang Chengyu Dong Jingbo Shang FAtt 34 4 0 18 Apr 2021
Condenser: a Pre-training Architecture for Dense Retrieval Luyu Gao Jamie Callan AI4CE 25 253 0 16 Apr 2021
Supervising Model Attention with Human Explanations for Robust Natural Language Inference Joe Stacey Yonatan Belinkov Marek Rei 30 45 0 16 Apr 2021
Probing Across Time: What Does RoBERTa Know and When? Leo Z. Liu Yizhong Wang Jungo Kasai Hannaneh Hajishirzi Noah A. Smith KELM 8 80 0 16 Apr 2021
Sparse Attention with Linear Units Biao Zhang Ivan Titov Rico Sennrich 6 38 0 14 Apr 2021
On the Impact of Knowledge-based Linguistic Annotations in the Quality of Scientific Embeddings Andrés García-Silva R. Denaux José Manuél Gómez-Pérez 31 3 0 13 Apr 2021
Understanding Transformers for Bot Detection in Twitter Andrés García-Silva Cristian Berrío José Manuél Gómez-Pérez 20 4 0 13 Apr 2021
WHOSe Heritage: Classification of UNESCO World Heritage "Outstanding Universal Value" Documents with Soft Labels Nan Bai Renqian Luo Pirouz Nourian A. Roders 29 6 0 12 Apr 2021
Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoBERTa Junqi Dai Hang Yan Tianxiang Sun Pengfei Liu Xipeng Qiu 14 160 0 11 Apr 2021
KI-BERT: Infusing Knowledge Context for Better Language and Domain Understanding Keyur Faldu A. Sheth Prashant Kikani Hemang Akabari 11 28 0 09 Apr 2021
Transformers: "The End of History" for NLP? Anton Chernyavskiy Dmitry Ilvovsky Preslav Nakov 39 30 0 09 Apr 2021
Low-Complexity Probing via Finding Subnetworks Steven Cao Victor Sanh Alexander M. Rush 11 51 0 08 Apr 2021
Attention Head Masking for Inference Time Content Selection in Abstractive Summarization Shuyang Cao Lu Wang CVBM 27 11 0 06 Apr 2021
Efficient Attentions for Long Document Summarization L. Huang Shuyang Cao Nikolaus Nova Parulian Heng Ji Lu Wang 54 272 0 05 Apr 2021
Compressing Visual-linguistic Model via Knowledge Distillation Zhiyuan Fang Jianfeng Wang Xiaowei Hu Lijuan Wang Yezhou Yang Zicheng Liu VLM 36 96 0 05 Apr 2021
Annotating Columns with Pre-trained Language Models Yoshihiko Suhara Jinfeng Li Yuliang Li Dan Zhang cCaugatay Demiralp Chen Chen W. Tan LMTD 10 83 0 05 Apr 2021
A New Approach to Overgenerating and Scoring Abstractive Summaries Kaiqiang Song Bingqing Wang Z. Feng Fei Liu 16 17 0 05 Apr 2021
Exploring the Role of BERT Token Representations to Explain Sentence Probing Results Hosein Mohebbi Ali Modarressi Mohammad Taher Pilehvar MILM 19 23 0 03 Apr 2021
Do RNN States Encode Abstract Phonological Processes? Miikka Silfverberg Francis M. Tyers Garrett Nicolai Mans Hulden 17 1 0 01 Apr 2021
Attention, please! A survey of Neural Attention Models in Deep Learning Alana de Santana Correia Esther Luna Colombini HAI 23 175 0 31 Mar 2021
Synthesis of Compositional Animations from Textual Descriptions Anindita Ghosh N. Cheema Cennet Oguz Christian Theobalt P. Slusallek 31 170 0 26 Mar 2021
Dodrio: Exploring Transformer Models with Interactive Visualization Zijie J. Wang Robert Turko Duen Horng Chau 12 35 0 26 Mar 2021
Zero-shot Sequence Labeling for Transformer-based Sentence Classifiers Kamil Bujel H. Yannakoudakis Marek Rei VLM 8 8 0 26 Mar 2021
Paragraph-level Rationale Extraction through Regularization: A case study on European Court of Human Rights Cases Ilias Chalkidis Manos Fergadiotis D. Tsarapatsanis Nikolaos Aletras Ion Androutsopoulos Prodromos Malakasiotis AILaw 13 107 0 24 Mar 2021
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures Sushant Singh A. Mahmood AI4TS 60 92 0 23 Mar 2021
Repairing Pronouns in Translation with BERT-Based Post-Editing Reid Pryzant 17 0 0 23 Mar 2021
Bridging the gap between supervised classification and unsupervised topic modelling for social-media assisted crisis management Mikael Brunila Rosie Zhao Andrei Mircea Sam Lumley R. Sieber 26 0 0 22 Mar 2021
Local Interpretations for Explainable Natural Language Processing: A Survey Siwen Luo Hamish Ivison S. Han Josiah Poon MILM 33 48 0 20 Mar 2021
GPT Understands, Too Xiao Liu Yanan Zheng Zhengxiao Du Ming Ding Yujie Qian Zhilin Yang Jie Tang VLM 45 1,144 0 18 Mar 2021
Symbolic integration by integrating learning models with different strengths and weaknesses Hazumi Kubota Y. Tokuoka Takahiro G. Yamada Akira Funahashi AIMat 21 4 0 09 Mar 2021
Few-shot Learning for Slot Tagging with Attentive Relational Network Cennet Oguz Ngoc Thang Vu 25 10 0 03 Mar 2021
Transformers with Competitive Ensembles of Independent Mechanisms Alex Lamb Di He Anirudh Goyal Guolin Ke Chien-Feng Liao Mirco Ravanelli Yoshua Bengio MoE 23 23 0 27 Feb 2021
SparseBERT: Rethinking the Importance Analysis in Self-attention Han Shi Jiahui Gao Xiaozhe Ren Hang Xu Xiaodan Liang Zhenguo Li James T. Kwok 23 54 0 25 Feb 2021
LazyFormer: Self Attention with Lazy Update Chengxuan Ying Guolin Ke Di He Tie-Yan Liu 17 15 0 25 Feb 2021
Probing Classifiers: Promises, Shortcomings, and Advances Yonatan Belinkov 226 405 0 24 Feb 2021
Using Prior Knowledge to Guide BERT's Attention in Semantic Textual Matching Tasks Tingyu Xia Yue Wang Yuan Tian Yi-Ju Chang 22 51 0 22 Feb 2021
Analyzing Curriculum Learning for Sentiment Analysis along Task Difficulty, Pacing and Visualization Axes Anvesh Rao Vijjini Kaveri Anuranjana R. Mamidi 35 2 0 19 Feb 2021
COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining Yu Meng Chenyan Xiong Payal Bajaj Saurabh Tiwary Paul N. Bennett Jiawei Han Xia Song 125 202 0 16 Feb 2021
Have Attention Heads in BERT Learned Constituency Grammar? Ziyang Luo 21 6 0 16 Feb 2021