Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.04341
Cited By
What Does BERT Look At? An Analysis of BERT's Attention
11 June 2019
Kevin Clark
Urvashi Khandelwal
Omer Levy
Christopher D. Manning
MILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What Does BERT Look At? An Analysis of BERT's Attention"
50 / 885 papers shown
Title
A Study of the Attention Abnormality in Trojaned BERTs
Weimin Lyu
Songzhu Zheng
Teng Ma
Chao Chen
51
56
0
13 May 2022
Exploiting Inductive Bias in Transformers for Unsupervised Disentanglement of Syntax and Semantics with VAEs
G. Felhi
Joseph Le Roux
Djamé Seddah
DRL
26
2
0
12 May 2022
A Song of (Dis)agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing
Michael Neely
Stefan F. Schouten
Maurits J. R. Bleeker
Ana Lucic
XAI
17
16
0
09 May 2022
Unsupervised Slot Schema Induction for Task-oriented Dialog
Dian Yu
Mingqiu Wang
Yuan Cao
Izhak Shafran
Laurent El Shafey
H. Soltau
36
13
0
09 May 2022
EigenNoise: A Contrastive Prior to Warm-Start Representations
H. Heidenreich
Jake Williams
13
1
0
09 May 2022
Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker Information
Chiyu Feng
Po-Chun Hsu
Hung-yi Lee
SSL
20
8
0
08 May 2022
When a sentence does not introduce a discourse entity, Transformer-based models still sometimes refer to it
Sebastian Schuster
Tal Linzen
13
25
0
06 May 2022
GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers
Ali Modarressi
Mohsen Fayyaz
Yadollah Yaghoobzadeh
Mohammad Taher Pilehvar
ViT
19
33
0
06 May 2022
Diversifying Neural Dialogue Generation via Negative Distillation
Yiwei Li
Shaoxiong Feng
Bin Sun
Kan Li
27
10
0
05 May 2022
Adaptable Adapters
N. Moosavi
Quentin Delfosse
Kristian Kersting
Iryna Gurevych
48
21
0
03 May 2022
BERTops: Studying BERT Representations under a Topological Lens
Jatin Chauhan
Manohar Kaul
16
3
0
02 May 2022
POLITICS: Pretraining with Same-story Article Comparison for Ideology Prediction and Stance Detection
Yujian Liu
Xinliang Frederick Zhang
David Wegsman
Nick Beauchamp
Lu Wang
30
71
0
02 May 2022
Visualizing and Explaining Language Models
Adrian M. P. Braşoveanu
Razvan Andonie
MILM
VLM
29
4
0
30 Apr 2022
RobBERTje: a Distilled Dutch BERT Model
Pieter Delobelle
Thomas Winters
Bettina Berendt
22
14
0
28 Apr 2022
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Derya Soydaner
3DV
44
149
0
27 Apr 2022
Do Transformer Models Show Similar Attention Patterns to Task-Specific Human Gaze?
Stephanie Brandl
Oliver Eberle
Jonas Pilot
Anders Søgaard
67
33
0
25 Apr 2022
Grad-SAM: Explaining Transformers via Gradient Self-Attention Maps
Oren Barkan
Edan Hauon
Avi Caciularu
Ori Katz
Itzik Malkiel
Omri Armstrong
Noam Koenigstein
26
37
0
23 Apr 2022
An Attention-Based Model for Predicting Contextual Informativeness and Curriculum Learning Applications
Sungjin Nam
David Jurgens
Gwen Frishkoff
Kevyn Collins-Thompson
8
0
0
21 Apr 2022
Probing Script Knowledge from Pre-Trained Models
Zijian Jin
Xingyu Zhang
Mo Yu
Lifu Huang
16
4
0
16 Apr 2022
A Review on Language Models as Knowledge Bases
Badr AlKhamissi
Millicent Li
Asli Celikyilmaz
Mona T. Diab
Marjan Ghazvininejad
KELM
41
175
0
12 Apr 2022
What do Toothbrushes do in the Kitchen? How Transformers Think our World is Structured
Alexander Henlein
Alexander Mehler
25
6
0
12 Apr 2022
Team ÚFAL at CMCL 2022 Shared Task: Figuring out the correct recipe for predicting Eye-Tracking features using Pretrained Language Models
Sunit Bhattacharya
Rishu Kumar
Ondrej Bojar
13
2
0
11 Apr 2022
How Conservative are Language Models? Adapting to the Introduction of Gender-Neutral Pronouns
Stephanie Brandl
Ruixiang Cui
Anders Søgaard
25
20
0
11 Apr 2022
Contextual Representation Learning beyond Masked Language Modeling
Zhiyi Fu
Wangchunshu Zhou
Jingjing Xu
Hao Zhou
Lei Li
28
25
0
08 Apr 2022
Paying More Attention to Self-attention: Improving Pre-trained Language Models via Attention Guiding
Shanshan Wang
Zhumin Chen
Z. Ren
Huasheng Liang
Qiang Yan
Pengjie Ren
25
9
0
06 Apr 2022
An Exploratory Study on Code Attention in BERT
Rishab Sharma
Fuxiang Chen
Fatemeh H. Fard
David Lo
19
25
0
05 Apr 2022
On Explaining Multimodal Hateful Meme Detection Models
Ming Shan Hee
Roy Ka-Wei Lee
Wen-Haw Chong
VLM
21
39
0
04 Apr 2022
Incorporating Dynamic Semantics into Pre-Trained Language Model for Aspect-based Sentiment Analysis
Kai Zhang
Kunpeng Zhang
Mengdi Zhang
Hongke Zhao
Qi Liu
Wei Yu Wu
Enhong Chen
9
51
0
30 Mar 2022
VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers
Estelle Aflalo
Meng Du
Shao-Yen Tseng
Yongfei Liu
Chenfei Wu
Nan Duan
Vasudev Lal
23
45
0
30 Mar 2022
Discovering material information using hierarchical Reformer model on financial regulatory filings
Francois Mercier
Makesh Narsimhan
AIFin
AI4TS
11
0
0
28 Mar 2022
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
Mor Geva
Avi Caciularu
Ke Wang
Yoav Goldberg
KELM
46
333
0
28 Mar 2022
On the Importance of Data Size in Probing Fine-tuned Models
Houman Mehrafarin
S. Rajaee
Mohammad Taher Pilehvar
17
18
0
17 Mar 2022
Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence Models
Aaron Mueller
Robert Frank
Tal Linzen
Luheng Wang
Sebastian Schuster
AIMat
19
33
0
17 Mar 2022
Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists
Giuseppe Attanasio
Debora Nozza
Dirk Hovy
Elena Baralis
17
53
0
17 Mar 2022
Multi-View Document Representation Learning for Open-Domain Dense Retrieval
Shunyu Zhang
Yaobo Liang
Ming Gong
Daxin Jiang
Nan Duan
RALM
3DV
AI4TS
33
61
0
16 Mar 2022
Signal in Noise: Exploring Meaning Encoded in Random Character Sequences with Character-Aware Language Models
Mark Chu
Bhargav Srinivasa Desikan
E. Nadler
Ruggerio L. Sardo
Elise Darragh-Ford
Douglas Guilbeault
20
0
0
15 Mar 2022
Visualizing and Understanding Patch Interactions in Vision Transformer
Jie Ma
Yalong Bai
Bineng Zhong
Wei Zhang
Ting Yao
Tao Mei
ViT
20
32
0
11 Mar 2022
Measuring the Mixing of Contextual Information in the Transformer
Javier Ferrando
Gerard I. Gállego
Marta R. Costa-jussá
23
49
0
08 Mar 2022
Controlling the Focus of Pretrained Language Generation Models
Jiabao Ji
Yoon Kim
James R. Glass
Tianxing He
30
5
0
02 Mar 2022
Tricks and Plugins to GBM on Images and Sequences
Biyi Fang
J. Utke
Diego Klabjan
25
0
0
01 Mar 2022
TrimBERT: Tailoring BERT for Trade-offs
S. N. Sridhar
Anthony Sarah
Sairam Sundaresan
MQ
21
4
0
24 Feb 2022
Self-Attention for Incomplete Utterance Rewriting
Yong Zhang
Zhitao Li
Jianzong Wang
Ning Cheng
Jing Xiao
17
4
0
24 Feb 2022
Do Transformers know symbolic rules, and would we know if they did?
Tommi Gröndahl
Yu-Wen Guo
Nirmal Asokan
25
0
0
19 Feb 2022
cosFormer: Rethinking Softmax in Attention
Zhen Qin
Weixuan Sun
Huicai Deng
Dongxu Li
Yunshen Wei
Baohong Lv
Junjie Yan
Lingpeng Kong
Yiran Zhong
24
211
0
17 Feb 2022
What Do They Capture? -- A Structural Analysis of Pre-Trained Language Models for Source Code
Yao Wan
Wei-Ye Zhao
Hongyu Zhang
Yulei Sui
Guandong Xu
Hairong Jin
29
105
0
14 Feb 2022
Temporal Attention for Language Models
Guy D. Rosin
Kira Radinsky
VLM
26
33
0
04 Feb 2022
Schema-Free Dependency Parsing via Sequence Generation
Boda Lin
Zijun Yao
Jiaxin Shi
S. Cao
Binghao Tang
Si Li
Yong Luo
Juanzi Li
Lei Hou
21
0
0
28 Jan 2022
Rethinking Attention-Model Explainability through Faithfulness Violation Test
Y. Liu
Haoliang Li
Yangyang Guo
Chen Kong
Jing Li
Shiqi Wang
FAtt
121
42
0
28 Jan 2022
Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks
Haoyu Dong
Zhoujun Cheng
Xinyi He
Mengyuan Zhou
Anda Zhou
Fan Zhou
Ao Liu
Shi Han
Dongmei Zhang
LMTD
65
64
0
24 Jan 2022
An Application of Pseudo-Log-Likelihoods to Natural Language Scoring
Darren Abramson
Ali Emami
38
3
0
23 Jan 2022
Previous
1
2
3
...
9
10
11
...
16
17
18
Next