Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1909.10351
Cited By
v1
v2
v3
v4
v5 (latest)
TinyBERT: Distilling BERT for Natural Language Understanding
Findings (Findings), 2019
23 September 2019
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"TinyBERT: Distilling BERT for Natural Language Understanding"
50 / 1,055 papers shown
Learning from the Best: Rationalizing Prediction by Adversarial Information Calibration
AAAI Conference on Artificial Intelligence (AAAI), 2020
Lei Sha
Oana-Maria Camburu
Thomas Lukasiewicz
395
40
0
16 Dec 2020
A Lightweight Neural Model for Biomedical Entity Linking
AAAI Conference on Artificial Intelligence (AAAI), 2020
Lihu Chen
Gaël Varoquaux
Fabian M. Suchanek
MedIm
156
37
0
16 Dec 2020
EmpLite: A Lightweight Sequence Labeling Model for Emphasis Selection of Short Texts
ICON (ICON), 2020
Vibhav Agarwal
Sourav Ghosh
Kranti Chalamalasetti
B. Challa
S. Kumari
Harshavardhana
Barath Raj Kandur Raja
81
4
0
15 Dec 2020
Parameter-Efficient Transfer Learning with Diff Pruning
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Demi Guo
Alexander M. Rush
Yoon Kim
326
469
0
14 Dec 2020
LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding
AAAI Conference on Artificial Intelligence (AAAI), 2020
Hao Fu
Shaojun Zhou
Qihong Yang
Junjie Tang
Guiquan Liu
Kaikui Liu
Xiaolong Li
297
66
0
14 Dec 2020
MiniVLM: A Smaller and Faster Vision-Language Model
Jianfeng Wang
Xiaowei Hu
Pengchuan Zhang
Xiujun Li
Lijuan Wang
Guang Dai
Jianfeng Gao
Zicheng Liu
VLM
MLLM
236
70
0
13 Dec 2020
Reinforced Multi-Teacher Selection for Knowledge Distillation
AAAI Conference on Artificial Intelligence (AAAI), 2020
Fei Yuan
Linjun Shou
Jian Pei
Wutao Lin
Ming Gong
Yan Fu
Daxin Jiang
321
147
0
11 Dec 2020
Improving Task-Agnostic BERT Distillation with Layer Mapping Search
Neurocomputing (Neurocomputing), 2020
Xiaoqi Jiao
Huating Chang
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
Fang Wang
Qun Liu
128
12
0
11 Dec 2020
Meta-KD: A Meta Knowledge Distillation Framework for Language Model Compression across Domains
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Haojie Pan
Chengyu Wang
Minghui Qiu
Yichang Zhang
Yaliang Li
Yanjie Liang
222
63
0
02 Dec 2020
CPM: A Large-scale Generative Chinese Pre-trained Language Model
AI Open (AO), 2020
Zhengyan Zhang
Xu Han
Hao Zhou
Pei Ke
Yuxian Gu
...
Wentao Han
Jie Tang
Juan-Zi Li
Xiaoyan Zhu
Maosong Sun
219
128
0
01 Dec 2020
A Selective Survey on Versatile Knowledge Distillation Paradigm for Neural Network Models
J. Ku
Jihun Oh
Youngyoon Lee
Gaurav Pooniwala
Sangjeong Lee
188
3
0
30 Nov 2020
Bringing AI To Edge: From Deep Learning's Perspective
Neurocomputing (Neurocomputing), 2020
Di Liu
Hao Kong
Xiangzhong Luo
Weichen Liu
Ravi Subramaniam
250
152
0
25 Nov 2020
EasyTransfer -- A Simple and Scalable Deep Transfer Learning Platform for NLP Applications
International Conference on Information and Knowledge Management (CIKM), 2020
Minghui Qiu
Peng Li
Chengyu Wang
Hanjie Pan
Yaliang Li
...
Jun Yang
Yaliang Li
Yanjie Liang
Deng Cai
Jialin Li
VLM
SyDa
362
20
0
18 Nov 2020
Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads
Zhengyan Zhang
Fanchao Qi
Zhiyuan Liu
Qun Liu
Maosong Sun
VLM
165
34
0
07 Nov 2020
Influence Patterns for Explaining Information Flow in BERT
Neural Information Processing Systems (NeurIPS), 2020
Kaiji Lu
Zifan Wang
Piotr (Peter) Mardziel
Anupam Datta
GNN
242
19
0
02 Nov 2020
FastFormers: Highly Efficient Transformer Models for Natural Language Understanding
Young Jin Kim
Hany Awadalla
AI4CE
164
47
0
26 Oct 2020
Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
Neural Information Processing Systems (NeurIPS), 2020
Minjia Zhang
Yuxiong He
AI4CE
146
116
0
26 Oct 2020
Pre-trained Summarization Distillation
Sam Shleifer
Alexander M. Rush
194
118
0
24 Oct 2020
Knowledge Distillation for Improved Accuracy in Spoken Question Answering
Chenyu You
Polydoros Giannouris
Yuexian Zou
399
58
0
21 Oct 2020
Optimal Subarchitecture Extraction For BERT
Adrian de Wynter
Daniel J. Perry
MQ
228
18
0
20 Oct 2020
BERT2DNN: BERT Distillation with Massive Unlabeled Data for Online E-Commerce Search
Industrial Conference on Data Mining (IDM), 2020
Yunjiang Jiang
Yue Shang
Ziyang Liu
Hongwei Shen
Yun Xiao
Wei Xiong
Sulong Xu
Weipeng P. Yan
Di Jin
146
17
0
20 Oct 2020
HABERTOR: An Efficient and Effective Deep Hatespeech Detector
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
T. Tran
Yifan Hu
Changwei Hu
Kevin Yen
Fei Tan
Kyumin Lee
Serim Park
VLM
192
34
0
17 Oct 2020
AutoADR: Automatic Model Design for Ad Relevance
Yiren Chen
Yaming Yang
Hong Sun
Yujing Wang
Yu Xu
Wei Shen
Rong Zhou
Yunhai Tong
Jing Bai
Ruofei Zhang
146
3
0
14 Oct 2020
Weight Squeezing: Reparameterization for Knowledge Transfer and Model Compression
Artem Chumachenko
Daniil Gavrilov
Nikita Balagansky
Pavel Kalaidin
201
1
0
14 Oct 2020
Pretrained Transformers for Text Ranking: BERT and Beyond
Jimmy J. Lin
Rodrigo Nogueira
Andrew Yates
VLM
796
699
0
13 Oct 2020
BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance
Jianquan Li
Xiaokang Liu
Honghong Zhao
Ruifeng Xu
Min Yang
Yaohong Jin
214
57
0
13 Oct 2020
Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)
Alex Warstadt
Yian Zhang
Haau-Sing Li
Haokun Liu
Samuel R. Bowman
SSL
AI4CE
202
26
0
11 Oct 2020
Adversarial Self-Supervised Data-Free Distillation for Text Classification
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Xinyin Ma
Yongliang Shen
Gongfan Fang
Chen Chen
Chenghao Jia
Weiming Lu
253
27
0
10 Oct 2020
Style Attuned Pre-training and Parameter Efficient Fine-tuning for Spoken Language Understanding
Interspeech (Interspeech), 2020
Jin Cao
Jun Wang
Wael Hamza
Kelly Vanee
Shang-Wen Li
84
10
0
09 Oct 2020
Deep Learning Meets Projective Clustering
International Conference on Learning Representations (ICLR), 2020
Alaa Maalouf
Harry Lang
Daniela Rus
Dan Feldman
217
10
0
08 Oct 2020
On the importance of pre-training data volume for compact language models
Vincent Micheli
Martin d'Hoffschmidt
Franccois Fleuret
205
45
0
08 Oct 2020
AxFormer: Accuracy-driven Approximation of Transformers for Faster, Smaller and more Accurate NLP Models
Amrit Nagarajan
Sanchari Sen
Jacob R. Stevens
A. Raghunathan
165
3
0
07 Oct 2020
CATBERT: Context-Aware Tiny BERT for Detecting Social Engineering Emails
Younghoon Lee
Joshua Saxe
Richard E. Harang
112
29
0
07 Oct 2020
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers
Yimeng Wu
Peyman Passban
Mehdi Rezagholizade
Qun Liu
MoE
133
37
0
06 Oct 2020
Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior
Findings (Findings), 2020
Zi Lin
Jeremiah Zhe Liu
Ziao Yang
Nan Hua
Dan Roth
200
49
0
05 Oct 2020
Pea-KD: Parameter-efficient and Accurate Knowledge Distillation on BERT
Ikhyun Cho
U. Kang
119
1
0
30 Sep 2020
Contrastive Distillation on Intermediate Representations for Language Model Compression
S. Sun
Zhe Gan
Yu Cheng
Yuwei Fang
Shuohang Wang
Jingjing Liu
VLM
227
81
0
29 Sep 2020
TernaryBERT: Distillation-aware Ultra-low Bit BERT
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Wei Zhang
Lu Hou
Yichun Yin
Lifeng Shang
Xiao Chen
Xin Jiang
Qun Liu
MQ
287
225
0
27 Sep 2020
A little goes a long way: Improving toxic language classification despite data scarcity
Findings (Findings), 2020
Mika Juuti
Tommi Gröndahl
Adrian Flanagan
Nirmal Asokan
243
28
0
25 Sep 2020
RecoBERT: A Catalog Language Model for Text-Based Recommendations
Findings (Findings), 2020
Itzik Malkiel
Oren Barkan
Avi Caciularu
Noam Razin
Ori Katz
Noam Koenigstein
266
14
0
25 Sep 2020
Hierarchical Pre-training for Sequence Labelling in Spoken Dialog
Findings (Findings), 2020
E. Chapuis
Pierre Colombo
Matteo Manica
Matthieu Labeau
Chloé Clavel
446
64
0
23 Sep 2020
Weight Distillation: Transferring the Knowledge in Neural Network Parameters
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Ye Lin
Yanyang Li
Ziyang Wang
Bei Li
Quan Du
Tong Xiao
Jingbo Zhu
278
28
0
19 Sep 2020
Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning
Findings (Findings), 2020
Bingbing Li
Zhenglun Kong
Tianyun Zhang
Ji Li
Hao Sun
Hang Liu
Caiwen Ding
VLM
416
66
0
17 Sep 2020
Simplified TinyBERT: Knowledge Distillation for Document Retrieval
European Conference on Information Retrieval (ECIR), 2020
Xuanang Chen
Xianpei Han
Kai Hui
Le Sun
Yingfei Sun
191
28
0
16 Sep 2020
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
North American Chapter of the Association for Computational Linguistics (NAACL), 2020
Timo Schick
Hinrich Schütze
438
1,063
0
15 Sep 2020
Real-Time Execution of Large-scale Language Models on Mobile
Wei Niu
Zhenglun Kong
Geng Yuan
Weiwen Jiang
Jiexiong Guan
Caiwen Ding
Pu Zhao
Sijia Liu
Bin Ren
Yanzhi Wang
MQ
125
7
0
15 Sep 2020
Efficient Transformers: A Survey
ACM Computing Surveys (ACM CSUR), 2020
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
863
1,350
0
14 Sep 2020
Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank Approximation
M. Tukan
Alaa Maalouf
Matan Weksler
Dan Feldman
219
9
0
11 Sep 2020
Pay Attention when Required
Swetha Mandava
Szymon Migacz
A. Fit-Florea
196
11
0
09 Sep 2020
Compression of Deep Learning Models for Text: A Survey
ACM Transactions on Knowledge Discovery from Data (TKDD), 2020
Manish Gupta
Puneet Agrawal
VLM
MedIm
AI4CE
510
134
0
12 Aug 2020
Previous
1
2
3
...
19
20
21
22
Next