What Does BERT Look At? An Analysis of BERT's Attention

11 June 2019

Kevin Clark

Urvashi Khandelwal

Omer Levy

Christopher D. Manning

MILM

ArXiv PDF HTML

Papers citing "What Does BERT Look At? An Analysis of BERT's Attention"

50 / 885 papers shown

Title
Char2Subword: Extending the Subword Embedding Space Using Robust Character Compositionality Gustavo Aguilar Bryan McCann Tong Niu Nazneen Rajani N. Keskar Thamar Solorio 47 12 0 24 Oct 2020
Long Document Ranking with Query-Directed Sparse Transformer Jyun-Yu Jiang Chenyan Xiong Chia-Jung Lee Wei Wang 21 25 0 23 Oct 2020
Language Models are Open Knowledge Graphs Chenguang Wang Xiao Liu D. Song SSL KELM 26 135 0 22 Oct 2020
Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling Wenxuan Zhou Kevin Huang Tengyu Ma Jing Huang 21 273 0 21 Oct 2020
TurnGPT: a Transformer-based Language Model for Predicting Turn-taking in Spoken Dialog Erik Ekstedt Gabriel Skantze 36 53 0 21 Oct 2020
Better Highlighting: Creating Sub-Sentence Summary Highlights Sangwoo Cho Kaiqiang Song Chen Li Dong Yu H. Foroosh Fei Liu 49 12 0 20 Oct 2020
Optimal Subarchitecture Extraction For BERT Adrian de Wynter Daniel J. Perry MQ 43 18 0 20 Oct 2020
A Benchmark for Lease Contract Review Spyretta Leivaditi Julien Rossi Evangelos Kanoulas AILaw 108 36 0 20 Oct 2020
Heads-up! Unsupervised Constituency Parsing via Self-Attention Heads Bowen Li Taeuk Kim Reinald Kim Amplayo Frank Keller SSL 48 17 0 19 Oct 2020
Towards Interpreting BERT for Reading Comprehension Based QA Sahana Ramnath Preksha Nema Deep Sahni Mitesh M. Khapra 36 30 0 18 Oct 2020
HABERTOR: An Efficient and Effective Deep Hatespeech Detector T. Tran Yifan Hu Changwei Hu Kevin Yen Fei Tan Kyumin Lee Serim Park VLM 15 32 0 17 Oct 2020
Example-Driven Intent Prediction with Observers Shikib Mehri Mihail Eric 28 39 0 17 Oct 2020
Mischief: A Simple Black-Box Attack Against Transformer Architectures Adrian de Wynter AAML 42 1 0 16 Oct 2020
Delaying Interaction Layers in Transformer-based Encoders for Efficient Open Domain Question Answering W. Siblini Mohamed Challal Charlotte Pasqual 21 3 0 16 Oct 2020
Detecting ESG topics using domain-specific language models and data augmentation approaches Timothy Nugent N. Stelea Jochen L. Leidner 31 13 0 16 Oct 2020
Understanding Neural Abstractive Summarization Models via Uncertainty Jiacheng Xu Shrey Desai Greg Durrett UQLM 6 47 0 15 Oct 2020
Does Chinese BERT Encode Word Structure? Yile Wang Leyang Cui Yue Zhang 33 6 0 15 Oct 2020
Pair the Dots: Jointly Examining Training History and Test Stimuli for Model Interpretability Yuxian Meng Chun Fan Zijun Sun Eduard H. Hovy Fei Wu Jiwei Li FAtt 8 10 0 14 Oct 2020
Pretrained Transformers for Text Ranking: BERT and Beyond Jimmy J. Lin Rodrigo Nogueira Andrew Yates VLM 239 611 0 13 Oct 2020
BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance Jianquan Li Xiaokang Liu Honghong Zhao Ruifeng Xu Min Yang Yaohong Jin 12 54 0 13 Oct 2020
Zero-shot Entity Linking with Efficient Long Range Sequence Modeling Zonghai Yao Liangliang Cao Huapu Pan VLM 8 21 0 12 Oct 2020
Layer-wise Guided Training for BERT: Learning Incrementally Refined Document Representations Nikolaos Manginas Ilias Chalkidis Prodromos Malakasiotis 8 4 0 12 Oct 2020
EFSG: Evolutionary Fooling Sentences Generator Marco Di Giovanni Marco Brambilla AAML 27 2 0 12 Oct 2020
Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually) Alex Warstadt Yian Zhang Haau-Sing Li Haokun Liu Samuel R. Bowman SSL AI4CE 37 21 0 11 Oct 2020
Structured Self-Attention Weights Encode Semantics in Sentiment Analysis Zhengxuan Wu Thanh-Son Nguyen Desmond C. Ong MILM 13 18 0 10 Oct 2020
Query-Key Normalization for Transformers Alex Henry Prudhvi Raj Dachapally S. Pawar Yuxuan Chen 17 75 0 08 Oct 2020
BERTering RAMS: What and How Much does BERT Already Know About Event Arguments? -- A Study on the RAMS Dataset Varun Gangal Eduard H. Hovy 6 4 0 08 Oct 2020
Two are Better than One: Joint Entity and Relation Extraction with Table-Sequence Encoders Jue Wang Wei Lu 26 224 0 08 Oct 2020
Assessing Phrasal Representation and Composition in Transformers Lang-Chi Yu Allyson Ettinger CoGe 22 67 0 08 Oct 2020
Learning to Fuse Sentences with Transformers for Summarization Logan Lebanoff Franck Dernoncourt Doo Soon Kim Lidan Wang W. Chang Fei Liu 18 22 0 08 Oct 2020
PyMT5: multi-mode translation of natural language and Python code with transformers Colin B. Clement Dawn Drain Jonathan Timcheck Alexey Svyatkovskiy Neel Sundaresan 19 152 0 07 Oct 2020
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers Yimeng Wu Peyman Passban Mehdi Rezagholizade Qun Liu MoE 15 34 0 06 Oct 2020
Intrinsic Probing through Dimension Selection Lucas Torroba Hennigen Adina Williams Ryan Cotterell 22 57 0 06 Oct 2020
Analyzing Individual Neurons in Pre-trained Language Models Nadir Durrani Hassan Sajjad Fahim Dalvi Yonatan Belinkov MILM 11 104 0 06 Oct 2020
BERT Knows Punta Cana is not just beautiful, it's gorgeous: Ranking Scalar Adjectives with Contextualised Representations Aina Garí Soler Marianna Apidianaki 4 19 0 06 Oct 2020
LSTMs Compose (and Learn) Bottom-Up Naomi Saphra Adam Lopez CoGe 21 18 0 06 Oct 2020
Finding the Evidence: Localization-aware Answer Prediction for Text Visual Question Answering Wei Han Hantao Huang Tao Han 6 51 0 06 Oct 2020
On the Branching Bias of Syntax Extracted from Pre-trained Language Models Huayang Li Lemao Liu Guoping Huang Shuming Shi 20 6 0 06 Oct 2020
Guiding Attention for Self-Supervised Learning with Transformers A. Deshpande Karthik Narasimhan 18 21 0 06 Oct 2020
Improving Neural Topic Models using Knowledge Distillation Alexander Miserlis Hoyle Pranav Goel Philip Resnik 14 47 0 05 Oct 2020
Linguistic Profiling of a Neural Language Model Alessio Miaschi D. Brunato F. Dell’Orletta Giulia Venturi 36 46 0 05 Oct 2020
Syntax Representation in Word Embeddings and Neural Networks -- A Survey Tomasz Limisiewicz David Marecek NAI 23 9 0 02 Oct 2020
LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention Ikuya Yamada Akari Asai Hiroyuki Shindo Hideaki Takeda Yuji Matsumoto 22 662 0 02 Oct 2020
Which *BERT? A Survey Organizing Contextualized Encoders Patrick Xia Shijie Wu Benjamin Van Durme 26 50 0 02 Oct 2020
XDA: Accurate, Robust Disassembly with Transfer Learning Kexin Pei Jonas Guan David Williams-King Junfeng Yang Suman Jana 9 58 0 02 Oct 2020
How LSTM Encodes Syntax: Exploring Context Vectors and Semi-Quantization on Natural Text Chihiro Shibata Kei Uchiumi D. Mochihashi 6 7 0 01 Oct 2020
Dual Attention Model for Citation Recommendation Yang Zhang Qiang Ma 8 14 0 01 Oct 2020
Bridging Information-Seeking Human Gaze and Machine Reading Comprehension J. Malmaud R. Levy Yevgeni Berzak 22 31 0 30 Sep 2020
Gender prediction using limited Twitter Data M. Burghoorn M. D. Boer S. Raaijmakers 6 1 0 29 Sep 2020
A Token-wise CNN-based Method for Sentence Compression Weiwei Hou H. Suominen Piotr Koniusz Sabrina Caldwell Tom Gedeon 6 3 0 23 Sep 2020