Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.04341
Cited By
What Does BERT Look At? An Analysis of BERT's Attention
11 June 2019
Kevin Clark
Urvashi Khandelwal
Omer Levy
Christopher D. Manning
MILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What Does BERT Look At? An Analysis of BERT's Attention"
50 / 885 papers shown
Title
Char2Subword: Extending the Subword Embedding Space Using Robust Character Compositionality
Gustavo Aguilar
Bryan McCann
Tong Niu
Nazneen Rajani
N. Keskar
Thamar Solorio
47
12
0
24 Oct 2020
Long Document Ranking with Query-Directed Sparse Transformer
Jyun-Yu Jiang
Chenyan Xiong
Chia-Jung Lee
Wei Wang
21
25
0
23 Oct 2020
Language Models are Open Knowledge Graphs
Chenguang Wang
Xiao Liu
D. Song
SSL
KELM
26
135
0
22 Oct 2020
Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling
Wenxuan Zhou
Kevin Huang
Tengyu Ma
Jing Huang
21
273
0
21 Oct 2020
TurnGPT: a Transformer-based Language Model for Predicting Turn-taking in Spoken Dialog
Erik Ekstedt
Gabriel Skantze
36
53
0
21 Oct 2020
Better Highlighting: Creating Sub-Sentence Summary Highlights
Sangwoo Cho
Kaiqiang Song
Chen Li
Dong Yu
H. Foroosh
Fei Liu
49
12
0
20 Oct 2020
Optimal Subarchitecture Extraction For BERT
Adrian de Wynter
Daniel J. Perry
MQ
43
18
0
20 Oct 2020
A Benchmark for Lease Contract Review
Spyretta Leivaditi
Julien Rossi
Evangelos Kanoulas
AILaw
108
36
0
20 Oct 2020
Heads-up! Unsupervised Constituency Parsing via Self-Attention Heads
Bowen Li
Taeuk Kim
Reinald Kim Amplayo
Frank Keller
SSL
48
17
0
19 Oct 2020
Towards Interpreting BERT for Reading Comprehension Based QA
Sahana Ramnath
Preksha Nema
Deep Sahni
Mitesh M. Khapra
36
30
0
18 Oct 2020
HABERTOR: An Efficient and Effective Deep Hatespeech Detector
T. Tran
Yifan Hu
Changwei Hu
Kevin Yen
Fei Tan
Kyumin Lee
Serim Park
VLM
15
32
0
17 Oct 2020
Example-Driven Intent Prediction with Observers
Shikib Mehri
Mihail Eric
28
39
0
17 Oct 2020
Mischief: A Simple Black-Box Attack Against Transformer Architectures
Adrian de Wynter
AAML
42
1
0
16 Oct 2020
Delaying Interaction Layers in Transformer-based Encoders for Efficient Open Domain Question Answering
W. Siblini
Mohamed Challal
Charlotte Pasqual
21
3
0
16 Oct 2020
Detecting ESG topics using domain-specific language models and data augmentation approaches
Timothy Nugent
N. Stelea
Jochen L. Leidner
31
13
0
16 Oct 2020
Understanding Neural Abstractive Summarization Models via Uncertainty
Jiacheng Xu
Shrey Desai
Greg Durrett
UQLM
6
47
0
15 Oct 2020
Does Chinese BERT Encode Word Structure?
Yile Wang
Leyang Cui
Yue Zhang
33
6
0
15 Oct 2020
Pair the Dots: Jointly Examining Training History and Test Stimuli for Model Interpretability
Yuxian Meng
Chun Fan
Zijun Sun
Eduard H. Hovy
Fei Wu
Jiwei Li
FAtt
8
10
0
14 Oct 2020
Pretrained Transformers for Text Ranking: BERT and Beyond
Jimmy J. Lin
Rodrigo Nogueira
Andrew Yates
VLM
239
611
0
13 Oct 2020
BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance
Jianquan Li
Xiaokang Liu
Honghong Zhao
Ruifeng Xu
Min Yang
Yaohong Jin
12
54
0
13 Oct 2020
Zero-shot Entity Linking with Efficient Long Range Sequence Modeling
Zonghai Yao
Liangliang Cao
Huapu Pan
VLM
8
21
0
12 Oct 2020
Layer-wise Guided Training for BERT: Learning Incrementally Refined Document Representations
Nikolaos Manginas
Ilias Chalkidis
Prodromos Malakasiotis
8
4
0
12 Oct 2020
EFSG: Evolutionary Fooling Sentences Generator
Marco Di Giovanni
Marco Brambilla
AAML
27
2
0
12 Oct 2020
Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)
Alex Warstadt
Yian Zhang
Haau-Sing Li
Haokun Liu
Samuel R. Bowman
SSL
AI4CE
37
21
0
11 Oct 2020
Structured Self-Attention Weights Encode Semantics in Sentiment Analysis
Zhengxuan Wu
Thanh-Son Nguyen
Desmond C. Ong
MILM
13
18
0
10 Oct 2020
Query-Key Normalization for Transformers
Alex Henry
Prudhvi Raj Dachapally
S. Pawar
Yuxuan Chen
17
75
0
08 Oct 2020
BERTering RAMS: What and How Much does BERT Already Know About Event Arguments? -- A Study on the RAMS Dataset
Varun Gangal
Eduard H. Hovy
6
4
0
08 Oct 2020
Two are Better than One: Joint Entity and Relation Extraction with Table-Sequence Encoders
Jue Wang
Wei Lu
26
224
0
08 Oct 2020
Assessing Phrasal Representation and Composition in Transformers
Lang-Chi Yu
Allyson Ettinger
CoGe
22
67
0
08 Oct 2020
Learning to Fuse Sentences with Transformers for Summarization
Logan Lebanoff
Franck Dernoncourt
Doo Soon Kim
Lidan Wang
W. Chang
Fei Liu
18
22
0
08 Oct 2020
PyMT5: multi-mode translation of natural language and Python code with transformers
Colin B. Clement
Dawn Drain
Jonathan Timcheck
Alexey Svyatkovskiy
Neel Sundaresan
19
152
0
07 Oct 2020
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers
Yimeng Wu
Peyman Passban
Mehdi Rezagholizade
Qun Liu
MoE
15
34
0
06 Oct 2020
Intrinsic Probing through Dimension Selection
Lucas Torroba Hennigen
Adina Williams
Ryan Cotterell
22
57
0
06 Oct 2020
Analyzing Individual Neurons in Pre-trained Language Models
Nadir Durrani
Hassan Sajjad
Fahim Dalvi
Yonatan Belinkov
MILM
11
104
0
06 Oct 2020
BERT Knows Punta Cana is not just beautiful, it's gorgeous: Ranking Scalar Adjectives with Contextualised Representations
Aina Garí Soler
Marianna Apidianaki
4
19
0
06 Oct 2020
LSTMs Compose (and Learn) Bottom-Up
Naomi Saphra
Adam Lopez
CoGe
21
18
0
06 Oct 2020
Finding the Evidence: Localization-aware Answer Prediction for Text Visual Question Answering
Wei Han
Hantao Huang
Tao Han
6
51
0
06 Oct 2020
On the Branching Bias of Syntax Extracted from Pre-trained Language Models
Huayang Li
Lemao Liu
Guoping Huang
Shuming Shi
20
6
0
06 Oct 2020
Guiding Attention for Self-Supervised Learning with Transformers
A. Deshpande
Karthik Narasimhan
18
21
0
06 Oct 2020
Improving Neural Topic Models using Knowledge Distillation
Alexander Miserlis Hoyle
Pranav Goel
Philip Resnik
14
47
0
05 Oct 2020
Linguistic Profiling of a Neural Language Model
Alessio Miaschi
D. Brunato
F. Dell’Orletta
Giulia Venturi
36
46
0
05 Oct 2020
Syntax Representation in Word Embeddings and Neural Networks -- A Survey
Tomasz Limisiewicz
David Marecek
NAI
23
9
0
02 Oct 2020
LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention
Ikuya Yamada
Akari Asai
Hiroyuki Shindo
Hideaki Takeda
Yuji Matsumoto
22
662
0
02 Oct 2020
Which *BERT? A Survey Organizing Contextualized Encoders
Patrick Xia
Shijie Wu
Benjamin Van Durme
26
50
0
02 Oct 2020
XDA: Accurate, Robust Disassembly with Transfer Learning
Kexin Pei
Jonas Guan
David Williams-King
Junfeng Yang
Suman Jana
9
58
0
02 Oct 2020
How LSTM Encodes Syntax: Exploring Context Vectors and Semi-Quantization on Natural Text
Chihiro Shibata
Kei Uchiumi
D. Mochihashi
6
7
0
01 Oct 2020
Dual Attention Model for Citation Recommendation
Yang Zhang
Qiang Ma
8
14
0
01 Oct 2020
Bridging Information-Seeking Human Gaze and Machine Reading Comprehension
J. Malmaud
R. Levy
Yevgeni Berzak
22
31
0
30 Sep 2020
Gender prediction using limited Twitter Data
M. Burghoorn
M. D. Boer
S. Raaijmakers
6
1
0
29 Sep 2020
A Token-wise CNN-based Method for Sentence Compression
Weiwei Hou
H. Suominen
Piotr Koniusz
Sabrina Caldwell
Tom Gedeon
6
3
0
23 Sep 2020
Previous
1
2
3
...
14
15
16
17
18
Next