Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.08593
Cited By
v1
v2 (latest)
Revealing the Dark Secrets of BERT
21 August 2019
Olga Kovaleva
Alexey Romanov
Anna Rogers
Anna Rumshisky
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Revealing the Dark Secrets of BERT"
35 / 185 papers shown
Title
BERTology Meets Biology: Interpreting Attention in Protein Language Models
Jesse Vig
Ali Madani
Lav Varshney
Caiming Xiong
R. Socher
Nazneen Rajani
110
295
0
26 Jun 2020
Memory Transformer
Andrey Kravchenko
Yuri Kuratov
Anton Peganov
Grigory V. Sapunov
RALM
78
72
0
20 Jun 2020
Pre-training Polish Transformer-based Language Models at Scale
Slawomir Dadas
Michal Perelkiewicz
Rafal Poswiata
98
39
0
07 Jun 2020
Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Peter Hawkins
Jared Davis
David Belanger
Lucy J. Colwell
Adrian Weller
100
86
0
05 Jun 2020
Understanding Self-Attention of Self-Supervised Audio Transformers
Shu-Wen Yang
Andy T. Liu
Hung-yi Lee
55
27
0
05 Jun 2020
Table Search Using a Deep Contextualized Language Model
Zhiyu Zoey Chen
M. Trabelsi
J. Heflin
Yinan Xu
Brian D. Davison
LMTD
90
57
0
19 May 2020
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
Jize Cao
Zhe Gan
Yu Cheng
Licheng Yu
Yen-Chun Chen
Jingjing Liu
VLM
115
130
0
15 May 2020
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
Ali Hadi Zadeh
Isak Edo
Omar Mohamed Awad
Andreas Moshovos
MQ
80
190
0
08 May 2020
The Cascade Transformer: an Application for Efficient Answer Sentence Selection
Luca Soldaini
Alessandro Moschitti
90
44
0
05 May 2020
Similarity Analysis of Contextual Word Representation Models
John M. Wu
Yonatan Belinkov
Hassan Sajjad
Nadir Durrani
Fahim Dalvi
James R. Glass
115
75
0
03 May 2020
When BERT Plays the Lottery, All Tickets Are Winning
Sai Prasanna
Anna Rogers
Anna Rumshisky
MILM
88
187
0
01 May 2020
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training
Linjie Li
Yen-Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
MLLM
VLM
OffRL
AI4TS
133
506
0
01 May 2020
A Matter of Framing: The Impact of Linguistic Formalism on Probing Results
Ilia Kuznetsov
Iryna Gurevych
52
26
0
30 Apr 2020
How do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking
Nicola De Cao
Michael Schlichtkrull
Wilker Aziz
Ivan Titov
76
92
0
30 Apr 2020
What Happens To BERT Embeddings During Fine-tuning?
Amil Merchant
Elahe Rahimtoroghi
Ellie Pavlick
Ian Tenney
110
189
0
29 Apr 2020
Masking as an Efficient Alternative to Finetuning for Pretrained Language Models
Mengjie Zhao
Tao R. Lin
Fei Mi
Martin Jaggi
Hinrich Schütze
77
121
0
26 Apr 2020
Quantifying the Contextualization of Word Representations with Semantic Class Probing
Mengjie Zhao
Philipp Dufter
Yadollah Yaghoobzadeh
Hinrich Schütze
83
27
0
25 Apr 2020
Lite Transformer with Long-Short Range Attention
Zhanghao Wu
Zhijian Liu
Ji Lin
Chengyue Wu
Song Han
62
323
0
24 Apr 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
221
4,109
0
10 Apr 2020
DynaBERT: Dynamic BERT with Adaptive Width and Depth
Lu Hou
Zhiqi Huang
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
MQ
91
323
0
08 Apr 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time
Weijie Liu
Peng Zhou
Zhe Zhao
Zhiruo Wang
Haotang Deng
Qi Ju
97
361
0
05 Apr 2020
AriEL: volume coding for sentence generation
Luca Herranz-Celotti
Simon Brodeur
Jean Rouat
27
0
0
30 Mar 2020
Information-Theoretic Probing with Minimum Description Length
Elena Voita
Ivan Titov
107
276
0
27 Mar 2020
Calibration of Pre-trained Transformers
Shrey Desai
Greg Durrett
UQLM
347
302
0
17 Mar 2020
A Survey on Contextual Embeddings
Qi Liu
Matt J. Kusner
Phil Blunsom
276
151
0
16 Mar 2020
A Primer in BERTology: What we know about how BERT works
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
143
1,511
0
27 Feb 2020
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Prakhar Ganesh
Yao Chen
Xin Lou
Mohammad Ali Khan
Yifan Yang
Hassan Sajjad
Preslav Nakov
Deming Chen
Marianne Winslett
AI4CE
136
201
0
27 Feb 2020
Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation
Alessandro Raganato
Yves Scherrer
Jörg Tiedemann
100
92
0
24 Feb 2020
SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word Models
Bin Wang
C.-C. Jay Kuo
50
156
0
16 Feb 2020
What Would Elsa Do? Freezing Layers During Transformer Fine-Tuning
Jaejun Lee
Raphael Tang
Jimmy J. Lin
69
127
0
08 Nov 2019
HUBERT Untangles BERT to Improve Transfer across NLP Tasks
M. Moradshahi
Hamid Palangi
M. Lam
P. Smolensky
Jianfeng Gao
141
16
0
25 Oct 2019
Emergent Properties of Finetuned Language Representation Models
Alexandre Matton
Luke de Oliveira
SSL
40
1
0
23 Oct 2019
Structured Pruning of a BERT-based Question Answering Model
J. Scott McCarley
Rishav Chakravarti
Avirup Sil
98
53
0
14 Oct 2019
UNITER: UNiversal Image-TExt Representation Learning
Yen-Chun Chen
Linjie Li
Licheng Yu
Ahmed El Kholy
Faisal Ahmed
Zhe Gan
Yu Cheng
Jingjing Liu
VLM
OT
134
448
0
25 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
130
1,881
0
23 Sep 2019
Previous
1
2
3
4