Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.05620
Cited By
Visualizing and Understanding the Effectiveness of BERT
15 August 2019
Y. Hao
Li Dong
Furu Wei
Ke Xu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visualizing and Understanding the Effectiveness of BERT"
22 / 22 papers shown
Title
HATFormer: Historic Handwritten Arabic Text Recognition with Transformers
Adrian Chan
Anupam Mijar
Mehreen Saeed
Chau-Wai Wong
Akram Khater
36
0
0
03 Oct 2024
Empowering Dual-Level Graph Self-Supervised Pretraining with Motif Discovery
Pengwei Yan
Kaisong Song
Zhuoren Jiang
Yangyang Kang
Tianqianjin Lin
Changlong Sun
Xiaozhong Liu
AI4CE
20
2
0
19 Dec 2023
Sparse is Enough in Fine-tuning Pre-trained Large Language Models
Weixi Song
Z. Li
Lefei Zhang
Hai Zhao
Bo Du
VLM
16
6
0
19 Dec 2023
Full Parameter Fine-tuning for Large Language Models with Limited Resources
Kai Lv
Yuqing Yang
Tengxiao Liu
Qi-jie Gao
Qipeng Guo
Xipeng Qiu
34
124
0
16 Jun 2023
KL Regularized Normalization Framework for Low Resource Tasks
Neeraj Kumar
Ankur Narang
Brejesh Lall
21
1
0
21 Dec 2022
Exploring Mode Connectivity for Pre-trained Language Models
Yujia Qin
Cheng Qian
Jing Yi
Weize Chen
Yankai Lin
Xu Han
Zhiyuan Liu
Maosong Sun
Jie Zhou
19
20
0
25 Oct 2022
Perspectives of Non-Expert Users on Cyber Security and Privacy: An Analysis of Online Discussions on Twitter
Nandita Pattnaik
Shujun Li
Jason R. C. Nurse
12
22
0
05 Jun 2022
Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker Information
Chiyu Feng
Po-Chun Hsu
Hung-yi Lee
SSL
17
8
0
08 May 2022
BERTops: Studying BERT Representations under a Topological Lens
Jatin Chauhan
Manohar Kaul
14
3
0
02 May 2022
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection
Xin Huang
A. Khetan
Rene Bidart
Zohar S. Karnin
17
14
0
27 Mar 2022
DeepNet: Scaling Transformers to 1,000 Layers
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Dongdong Zhang
Furu Wei
MoE
AI4CE
15
155
0
01 Mar 2022
A Survey of Pretraining on Graphs: Taxonomy, Methods, and Applications
Jun-Xiong Xia
Yanqiao Zhu
Yuanqi Du
Stan Z. Li
VLM
30
41
0
16 Feb 2022
Interpreting Deep Learning Models in Natural Language Processing: A Review
Xiaofei Sun
Diyi Yang
Xiaoya Li
Tianwei Zhang
Yuxian Meng
Han Qiu
Guoyin Wang
Eduard H. Hovy
Jiwei Li
17
44
0
20 Oct 2021
How Does Adversarial Fine-Tuning Benefit BERT?
J. Ebrahimi
Hao Yang
Wei Zhang
AAML
13
4
0
31 Aug 2021
Neural Databases
James Thorne
Majid Yazdani
Marzieh Saeidi
Fabrizio Silvestri
Sebastian Riedel
A. Halevy
NAI
18
9
0
14 Oct 2020
Attention Flows: Analyzing and Comparing Attention Mechanisms in Language Models
Joseph F DeRose
Jiayao Wang
M. Berger
13
83
0
03 Sep 2020
PERL: Pivot-based Domain Adaptation for Pre-trained Deep Contextualized Embedding Models
Eyal Ben-David
Carmel Rabinovitz
Roi Reichart
SSL
44
61
0
16 Jun 2020
Generative Data Augmentation for Commonsense Reasoning
Yiben Yang
Chaitanya Malaviya
Jared Fernandez
Swabha Swayamdipta
Ronan Le Bras
Ji-ping Wang
Chandra Bhagavatula
Yejin Choi
Doug Downey
LRM
22
91
0
24 Apr 2020
Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation
Alessandro Raganato
Yves Scherrer
Jörg Tiedemann
17
92
0
24 Feb 2020
Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue
Byeongchang Kim
Jaewoo Ahn
Gunhee Kim
BDL
22
167
0
18 Feb 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
273
2,878
0
15 Sep 2016
1