What Does BERT Look At? An Analysis of BERT's Attention

11 June 2019

Kevin Clark

Urvashi Khandelwal

Omer Levy

Christopher D. Manning

MILM

ArXiv PDF HTML

Papers citing "What Does BERT Look At? An Analysis of BERT's Attention"

50 / 885 papers shown

Title
A Study of the Attention Abnormality in Trojaned BERTs Weimin Lyu Songzhu Zheng Teng Ma Chao Chen 51 56 0 13 May 2022
Exploiting Inductive Bias in Transformers for Unsupervised Disentanglement of Syntax and Semantics with VAEs G. Felhi Joseph Le Roux Djamé Seddah DRL 26 2 0 12 May 2022
A Song of (Dis)agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing Michael Neely Stefan F. Schouten Maurits J. R. Bleeker Ana Lucic XAI 17 16 0 09 May 2022
Unsupervised Slot Schema Induction for Task-oriented Dialog Dian Yu Mingqiu Wang Yuan Cao Izhak Shafran Laurent El Shafey H. Soltau 36 13 0 09 May 2022
EigenNoise: A Contrastive Prior to Warm-Start Representations H. Heidenreich Jake Williams 13 1 0 09 May 2022
Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker Information Chiyu Feng Po-Chun Hsu Hung-yi Lee SSL 20 8 0 08 May 2022
When a sentence does not introduce a discourse entity, Transformer-based models still sometimes refer to it Sebastian Schuster Tal Linzen 13 25 0 06 May 2022
GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers Ali Modarressi Mohsen Fayyaz Yadollah Yaghoobzadeh Mohammad Taher Pilehvar ViT 19 33 0 06 May 2022
Diversifying Neural Dialogue Generation via Negative Distillation Yiwei Li Shaoxiong Feng Bin Sun Kan Li 27 10 0 05 May 2022
Adaptable Adapters N. Moosavi Quentin Delfosse Kristian Kersting Iryna Gurevych 48 21 0 03 May 2022
BERTops: Studying BERT Representations under a Topological Lens Jatin Chauhan Manohar Kaul 16 3 0 02 May 2022
POLITICS: Pretraining with Same-story Article Comparison for Ideology Prediction and Stance Detection Yujian Liu Xinliang Frederick Zhang David Wegsman Nick Beauchamp Lu Wang 30 71 0 02 May 2022
Visualizing and Explaining Language Models Adrian M. P. Braşoveanu Razvan Andonie MILM VLM 29 4 0 30 Apr 2022
RobBERTje: a Distilled Dutch BERT Model Pieter Delobelle Thomas Winters Bettina Berendt 22 14 0 28 Apr 2022
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes Derya Soydaner 3DV 44 149 0 27 Apr 2022
Do Transformer Models Show Similar Attention Patterns to Task-Specific Human Gaze? Stephanie Brandl Oliver Eberle Jonas Pilot Anders Søgaard 67 33 0 25 Apr 2022
Grad-SAM: Explaining Transformers via Gradient Self-Attention Maps Oren Barkan Edan Hauon Avi Caciularu Ori Katz Itzik Malkiel Omri Armstrong Noam Koenigstein 26 37 0 23 Apr 2022
An Attention-Based Model for Predicting Contextual Informativeness and Curriculum Learning Applications Sungjin Nam David Jurgens Gwen Frishkoff Kevyn Collins-Thompson 8 0 0 21 Apr 2022
Probing Script Knowledge from Pre-Trained Models Zijian Jin Xingyu Zhang Mo Yu Lifu Huang 16 4 0 16 Apr 2022
A Review on Language Models as Knowledge Bases Badr AlKhamissi Millicent Li Asli Celikyilmaz Mona T. Diab Marjan Ghazvininejad KELM 41 175 0 12 Apr 2022
What do Toothbrushes do in the Kitchen? How Transformers Think our World is Structured Alexander Henlein Alexander Mehler 25 6 0 12 Apr 2022
Team ÚFAL at CMCL 2022 Shared Task: Figuring out the correct recipe for predicting Eye-Tracking features using Pretrained Language Models Sunit Bhattacharya Rishu Kumar Ondrej Bojar 13 2 0 11 Apr 2022
How Conservative are Language Models? Adapting to the Introduction of Gender-Neutral Pronouns Stephanie Brandl Ruixiang Cui Anders Søgaard 25 20 0 11 Apr 2022
Contextual Representation Learning beyond Masked Language Modeling Zhiyi Fu Wangchunshu Zhou Jingjing Xu Hao Zhou Lei Li 28 25 0 08 Apr 2022
Paying More Attention to Self-attention: Improving Pre-trained Language Models via Attention Guiding Shanshan Wang Zhumin Chen Z. Ren Huasheng Liang Qiang Yan Pengjie Ren 25 9 0 06 Apr 2022
An Exploratory Study on Code Attention in BERT Rishab Sharma Fuxiang Chen Fatemeh H. Fard David Lo 19 25 0 05 Apr 2022
On Explaining Multimodal Hateful Meme Detection Models Ming Shan Hee Roy Ka-Wei Lee Wen-Haw Chong VLM 21 39 0 04 Apr 2022
Incorporating Dynamic Semantics into Pre-Trained Language Model for Aspect-based Sentiment Analysis Kai Zhang Kunpeng Zhang Mengdi Zhang Hongke Zhao Qi Liu Wei Yu Wu Enhong Chen 9 51 0 30 Mar 2022
VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers Estelle Aflalo Meng Du Shao-Yen Tseng Yongfei Liu Chenfei Wu Nan Duan Vasudev Lal 23 45 0 30 Mar 2022
Discovering material information using hierarchical Reformer model on financial regulatory filings Francois Mercier Makesh Narsimhan AIFin AI4TS 11 0 0 28 Mar 2022
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space Mor Geva Avi Caciularu Ke Wang Yoav Goldberg KELM 46 333 0 28 Mar 2022
On the Importance of Data Size in Probing Fine-tuned Models Houman Mehrafarin S. Rajaee Mohammad Taher Pilehvar 17 18 0 17 Mar 2022
Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence Models Aaron Mueller Robert Frank Tal Linzen Luheng Wang Sebastian Schuster AIMat 19 33 0 17 Mar 2022
Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists Giuseppe Attanasio Debora Nozza Dirk Hovy Elena Baralis 17 53 0 17 Mar 2022
Multi-View Document Representation Learning for Open-Domain Dense Retrieval Shunyu Zhang Yaobo Liang Ming Gong Daxin Jiang Nan Duan RALM 3DV AI4TS 33 61 0 16 Mar 2022
Signal in Noise: Exploring Meaning Encoded in Random Character Sequences with Character-Aware Language Models Mark Chu Bhargav Srinivasa Desikan E. Nadler Ruggerio L. Sardo Elise Darragh-Ford Douglas Guilbeault 20 0 0 15 Mar 2022
Visualizing and Understanding Patch Interactions in Vision Transformer Jie Ma Yalong Bai Bineng Zhong Wei Zhang Ting Yao Tao Mei ViT 20 32 0 11 Mar 2022
Measuring the Mixing of Contextual Information in the Transformer Javier Ferrando Gerard I. Gállego Marta R. Costa-jussá 23 49 0 08 Mar 2022
Controlling the Focus of Pretrained Language Generation Models Jiabao Ji Yoon Kim James R. Glass Tianxing He 30 5 0 02 Mar 2022
Tricks and Plugins to GBM on Images and Sequences Biyi Fang J. Utke Diego Klabjan 25 0 0 01 Mar 2022
TrimBERT: Tailoring BERT for Trade-offs S. N. Sridhar Anthony Sarah Sairam Sundaresan MQ 21 4 0 24 Feb 2022
Self-Attention for Incomplete Utterance Rewriting Yong Zhang Zhitao Li Jianzong Wang Ning Cheng Jing Xiao 17 4 0 24 Feb 2022
Do Transformers know symbolic rules, and would we know if they did? Tommi Gröndahl Yu-Wen Guo Nirmal Asokan 25 0 0 19 Feb 2022
cosFormer: Rethinking Softmax in Attention Zhen Qin Weixuan Sun Huicai Deng Dongxu Li Yunshen Wei Baohong Lv Junjie Yan Lingpeng Kong Yiran Zhong 24 211 0 17 Feb 2022
What Do They Capture? -- A Structural Analysis of Pre-Trained Language Models for Source Code Yao Wan Wei-Ye Zhao Hongyu Zhang Yulei Sui Guandong Xu Hairong Jin 29 105 0 14 Feb 2022
Temporal Attention for Language Models Guy D. Rosin Kira Radinsky VLM 26 33 0 04 Feb 2022
Schema-Free Dependency Parsing via Sequence Generation Boda Lin Zijun Yao Jiaxin Shi S. Cao Binghao Tang Si Li Yong Luo Juanzi Li Lei Hou 21 0 0 28 Jan 2022
Rethinking Attention-Model Explainability through Faithfulness Violation Test Y. Liu Haoliang Li Yangyang Guo Chen Kong Jing Li Shiqi Wang FAtt 121 42 0 28 Jan 2022
Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks Haoyu Dong Zhoujun Cheng Xinyi He Mengyuan Zhou Anda Zhou Fan Zhou Ao Liu Shi Han Dongmei Zhang LMTD 65 64 0 24 Jan 2022
An Application of Pseudo-Log-Likelihoods to Natural Language Scoring Darren Abramson Ali Emami 38 3 0 23 Jan 2022