Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.02984
Cited By
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
6 April 2020
Zhiqing Sun
Hongkun Yu
Xiaodan Song
Renjie Liu
Yiming Yang
Denny Zhou
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices"
45 / 145 papers shown
Title
FQuAD2.0: French Question Answering and knowing that you know nothing
Quentin Heinrich
Gautier Viaud
Wacim Belblidia
11
8
0
27 Sep 2021
Understanding and Overcoming the Challenges of Efficient Transformer Quantization
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
22
133
0
27 Sep 2021
RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation
Md. Akmal Haidar
Nithin Anchuri
Mehdi Rezagholizadeh
Abbas Ghaddar
Philippe Langlais
Pascal Poupart
31
22
0
21 Sep 2021
Knowledge Distillation with Noisy Labels for Natural Language Understanding
Shivendra Bhardwaj
Abbas Ghaddar
Ahmad Rashid
Khalil Bibi
Cheng-huan Li
A. Ghodsi
Philippe Langlais
Mehdi Rezagholizadeh
19
1
0
21 Sep 2021
EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation
Chenhe Dong
Guangrun Wang
Hang Xu
Jiefeng Peng
Xiaozhe Ren
Xiaodan Liang
21
28
0
15 Sep 2021
Will this Question be Answered? Question Filtering via Answer Model Distillation for Efficient Question Answering
Siddhant Garg
Alessandro Moschitti
29
26
0
14 Sep 2021
KroneckerBERT: Learning Kronecker Decomposition for Pre-trained Language Models via Knowledge Distillation
Marzieh S. Tahaei
Ella Charlaix
V. Nia
A. Ghodsi
Mehdi Rezagholizadeh
46
22
0
13 Sep 2021
How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding
Tianda Li
Ahmad Rashid
A. Jafari
Pranav Sharma
A. Ghodsi
Mehdi Rezagholizadeh
AAML
30
5
0
13 Sep 2021
Compute and Energy Consumption Trends in Deep Learning Inference
Radosvet Desislavov
Fernando Martínez-Plumed
José Hernández-Orallo
35
113
0
12 Sep 2021
Block Pruning For Faster Transformers
François Lagunas
Ella Charlaix
Victor Sanh
Alexander M. Rush
VLM
18
218
0
10 Sep 2021
Learning to Teach with Student Feedback
Yitao Liu
Tianxiang Sun
Xipeng Qiu
Xuanjing Huang
VLM
17
6
0
10 Sep 2021
PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text Recognition
Zhi Qiao
Yu Zhou
Jin Wei
Wei Wang
Yuanqing Zhang
Ning Jiang
Hongbin Wang
Weiping Wang
22
70
0
09 Sep 2021
What's Hidden in a One-layer Randomly Weighted Transformer?
Sheng Shen
Z. Yao
Douwe Kiela
Kurt Keutzer
Michael W. Mahoney
32
4
0
08 Sep 2021
Sequential Attention Module for Natural Language Processing
Mengyuan Zhou
Jian Ma
Haiqing Yang
Lian-Xin Jiang
Yang Mo
AI4TS
24
2
0
07 Sep 2021
Greenformers: Improving Computation and Memory Efficiency in Transformer Models via Low-Rank Approximation
Samuel Cahyawijaya
26
12
0
24 Aug 2021
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
Sheng-Chun Kao
Suvinay Subramanian
Gaurav Agrawal
Amir Yazdanbakhsh
T. Krishna
38
57
0
13 Jul 2021
Learned Token Pruning for Transformers
Sehoon Kim
Sheng Shen
D. Thorsley
A. Gholami
Woosuk Kwon
Joseph Hassoun
Kurt Keutzer
14
145
0
02 Jul 2021
Open, Sesame! Introducing Access Control to Voice Services
Dominika Woszczyk
Alvin Lee
Soteris Demetriou
AAML
19
0
0
27 Jun 2021
Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better
Gaurav Menghani
VLM
MedIm
23
366
0
16 Jun 2021
XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation
Subhabrata Mukherjee
Ahmed Hassan Awadallah
Jianfeng Gao
19
22
0
08 Jun 2021
How Good Is NLP? A Sober Look at NLP Tasks through the Lens of Social Impact
Zhijing Jin
Geeticka Chauhan
Brian Tse
Mrinmaya Sachan
Rada Mihalcea
24
25
0
04 Jun 2021
ERNIE-Tiny : A Progressive Distillation Framework for Pretrained Transformer Compression
Weiyue Su
Xuyi Chen
Shi Feng
Jiaxiang Liu
Weixin Liu
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
28
13
0
04 Jun 2021
Disfluency Detection with Unlabeled Data and Small BERT Models
Johann C. Rocholl
Vicky Zayats
D. D. Walker
Noah B. Murad
Aaron Schneider
Daniel J. Liebling
49
27
0
21 Apr 2021
Compressing Visual-linguistic Model via Knowledge Distillation
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lijuan Wang
Yezhou Yang
Zicheng Liu
VLM
39
96
0
05 Apr 2021
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures
Sushant Singh
A. Mahmood
AI4TS
60
92
0
23 Mar 2021
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
Yihe Dong
Jean-Baptiste Cordonnier
Andreas Loukas
52
373
0
05 Mar 2021
SEED: Self-supervised Distillation For Visual Representation
Zhiyuan Fang
Jianfeng Wang
Lijuan Wang
Lei Zhang
Yezhou Yang
Zicheng Liu
SSL
245
190
0
12 Jan 2021
Reservoir Transformers
Sheng Shen
Alexei Baevski
Ari S. Morcos
Kurt Keutzer
Michael Auli
Douwe Kiela
35
17
0
30 Dec 2020
LiteMuL: A Lightweight On-Device Sequence Tagger using Multi-task Learning
S. Kumari
Vibhav Agarwal
B. Challa
Kranti Chalamalasetti
Sourav Ghosh
Harshavardhana
Barath Raj Kandur Raja
13
1
0
15 Dec 2020
Parameter-Efficient Transfer Learning with Diff Pruning
Demi Guo
Alexander M. Rush
Yoon Kim
13
385
0
14 Dec 2020
LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding
Hao Fu
Shaojun Zhou
Qihong Yang
Junjie Tang
Guiquan Liu
Kaikui Liu
Xiaolong Li
37
57
0
14 Dec 2020
MiniVLM: A Smaller and Faster Vision-Language Model
Jianfeng Wang
Xiaowei Hu
Pengchuan Zhang
Xiujun Li
Lijuan Wang
L. Zhang
Jianfeng Gao
Zicheng Liu
VLM
MLLM
35
59
0
13 Dec 2020
Two Stage Transformer Model for COVID-19 Fake News Detection and Fact Checking
Rutvik Vijjali
Prathyush Potluri
S. Kumar
Sundeep Teki
MedIm
26
74
0
26 Nov 2020
Bringing AI To Edge: From Deep Learning's Perspective
Di Liu
Hao Kong
Xiangzhong Luo
Weichen Liu
Ravi Subramaniam
52
116
0
25 Nov 2020
Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads
Zhengyan Zhang
Fanchao Qi
Zhiyuan Liu
Qun Liu
Maosong Sun
VLM
36
30
0
07 Nov 2020
Federated Knowledge Distillation
Hyowoon Seo
Jihong Park
Seungeun Oh
M. Bennis
Seong-Lyun Kim
FedML
31
91
0
04 Nov 2020
Pre-trained Summarization Distillation
Sam Shleifer
Alexander M. Rush
23
98
0
24 Oct 2020
Rethinking embedding coupling in pre-trained language models
Hyung Won Chung
Thibault Févry
Henry Tsai
Melvin Johnson
Sebastian Ruder
95
142
0
24 Oct 2020
AdapterDrop: On the Efficiency of Adapters in Transformers
Andreas Rucklé
Gregor Geigle
Max Glockner
Tilman Beck
Jonas Pfeiffer
Nils Reimers
Iryna Gurevych
46
254
0
22 Oct 2020
ConvBERT: Improving BERT with Span-based Dynamic Convolution
Zihang Jiang
Weihao Yu
Daquan Zhou
Yunpeng Chen
Jiashi Feng
Shuicheng Yan
37
157
0
06 Aug 2020
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Zihang Dai
Guokun Lai
Yiming Yang
Quoc V. Le
48
229
0
05 Jun 2020
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
Ali Hadi Zadeh
Isak Edo
Omar Mohamed Awad
Andreas Moshovos
MQ
30
183
0
08 May 2020
DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering
Qingqing Cao
H. Trivedi
A. Balasubramanian
Niranjan Balasubramanian
32
66
0
02 May 2020
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MA
VLM
243
1,452
0
18 Mar 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
Previous
1
2
3