Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.13382
Cited By
FastFormers: Highly Efficient Transformer Models for Natural Language Understanding
26 October 2020
Young Jin Kim
Hany Awadalla
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FastFormers: Highly Efficient Transformer Models for Natural Language Understanding"
27 / 27 papers shown
Title
On Multilingual Encoder Language Model Compression for Low-Resource Languages
Daniil Gurgurov
Michal Gregor
Josef van Genabith
Simon Ostermann
96
0
0
22 May 2025
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov
Kushal Tirumala
Hassan Shapourian
Paolo Glorioso
Daniel A. Roberts
78
93
0
26 Mar 2024
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
170
1,678
0
08 Jun 2020
BERT Loses Patience: Fast and Robust Inference with Early Exit
Wangchunshu Zhou
Canwen Xu
Tao Ge
Julian McAuley
Ke Xu
Furu Wei
36
337
0
07 Jun 2020
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Victor Sanh
Thomas Wolf
Alexander M. Rush
57
475
0
15 May 2020
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
Ji Xin
Raphael Tang
Jaejun Lee
Yaoliang Yu
Jimmy J. Lin
42
370
0
27 Apr 2020
The Right Tool for the Job: Matching Model and Instance Complexities
Roy Schwartz
Gabriel Stanovsky
Swabha Swayamdipta
Jesse Dodge
Noah A. Smith
73
168
0
16 Apr 2020
Training with Quantization Noise for Extreme Model Compression
Angela Fan
Pierre Stock
Benjamin Graham
Edouard Grave
Remi Gribonval
Hervé Jégou
Armand Joulin
MQ
57
244
0
15 Apr 2020
DynaBERT: Dynamic BERT with Adaptive Width and Depth
Lu Hou
Zhiqi Huang
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
MQ
65
322
0
08 Apr 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time
Weijie Liu
Peng Zhou
Zhe Zhao
Zhiruo Wang
Haotang Deng
Qi Ju
75
356
0
05 Apr 2020
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning
Mitchell A. Gordon
Kevin Duh
Nicholas Andrews
VLM
37
339
0
19 Feb 2020
Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning
Peter Henderson
Jie Hu
Joshua Romoff
Emma Brunskill
Dan Jurafsky
Joelle Pineau
68
445
0
31 Jan 2020
Q8BERT: Quantized 8Bit BERT
Ofir Zafrir
Guy Boudoukh
Peter Izsak
Moshe Wasserblat
MQ
57
502
0
14 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
134
7,437
0
02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
62
1,847
0
23 Sep 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
408
24,160
0
26 Jul 2019
Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP
Haonan Yu
Sergey Edunov
Yuandong Tian
Ari S. Morcos
49
148
0
06 Jun 2019
Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model
Aishwarya Bhandare
Vamsi Sripathi
Deepthi Karkada
Vivek V. Menon
Sun Choi
Kushal Datta
V. Saletore
MQ
53
132
0
03 Jun 2019
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
79
1,051
0
25 May 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
76
1,120
0
23 May 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
183
2,296
0
02 May 2019
Efficient Attention: Attention with Linear Complexities
Zhuoran Shen
Mingyuan Zhang
Haiyu Zhao
Shuai Yi
Hongsheng Li
83
519
0
04 Dec 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
966
93,936
0
11 Oct 2018
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Jonathan Frankle
Michael Carbin
170
3,433
0
09 Mar 2018
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
453
129,831
0
12 Jun 2017
Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU
Jacob Devlin
49
36
0
04 May 2017
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
241
19,523
0
09 Mar 2015
1