ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.13382
  4. Cited By
FastFormers: Highly Efficient Transformer Models for Natural Language
  Understanding

FastFormers: Highly Efficient Transformer Models for Natural Language Understanding

26 October 2020
Young Jin Kim
Hany Awadalla
    AI4CE
ArXivPDFHTML

Papers citing "FastFormers: Highly Efficient Transformer Models for Natural Language Understanding"

27 / 27 papers shown
Title
On Multilingual Encoder Language Model Compression for Low-Resource Languages
On Multilingual Encoder Language Model Compression for Low-Resource Languages
Daniil Gurgurov
Michal Gregor
Josef van Genabith
Simon Ostermann
96
0
0
22 May 2025
The Unreasonable Ineffectiveness of the Deeper Layers
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov
Kushal Tirumala
Hassan Shapourian
Paolo Glorioso
Daniel A. Roberts
78
93
0
26 Mar 2024
Linformer: Self-Attention with Linear Complexity
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
170
1,678
0
08 Jun 2020
BERT Loses Patience: Fast and Robust Inference with Early Exit
BERT Loses Patience: Fast and Robust Inference with Early Exit
Wangchunshu Zhou
Canwen Xu
Tao Ge
Julian McAuley
Ke Xu
Furu Wei
36
337
0
07 Jun 2020
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Victor Sanh
Thomas Wolf
Alexander M. Rush
57
475
0
15 May 2020
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
Ji Xin
Raphael Tang
Jaejun Lee
Yaoliang Yu
Jimmy J. Lin
42
370
0
27 Apr 2020
The Right Tool for the Job: Matching Model and Instance Complexities
The Right Tool for the Job: Matching Model and Instance Complexities
Roy Schwartz
Gabriel Stanovsky
Swabha Swayamdipta
Jesse Dodge
Noah A. Smith
73
168
0
16 Apr 2020
Training with Quantization Noise for Extreme Model Compression
Training with Quantization Noise for Extreme Model Compression
Angela Fan
Pierre Stock
Benjamin Graham
Edouard Grave
Remi Gribonval
Hervé Jégou
Armand Joulin
MQ
57
244
0
15 Apr 2020
DynaBERT: Dynamic BERT with Adaptive Width and Depth
DynaBERT: Dynamic BERT with Adaptive Width and Depth
Lu Hou
Zhiqi Huang
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
MQ
65
322
0
08 Apr 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time
FastBERT: a Self-distilling BERT with Adaptive Inference Time
Weijie Liu
Peng Zhou
Zhe Zhao
Zhiruo Wang
Haotang Deng
Qi Ju
75
356
0
05 Apr 2020
Compressing BERT: Studying the Effects of Weight Pruning on Transfer
  Learning
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning
Mitchell A. Gordon
Kevin Duh
Nicholas Andrews
VLM
37
339
0
19 Feb 2020
Towards the Systematic Reporting of the Energy and Carbon Footprints of
  Machine Learning
Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning
Peter Henderson
Jie Hu
Joshua Romoff
Emma Brunskill
Dan Jurafsky
Joelle Pineau
68
445
0
31 Jan 2020
Q8BERT: Quantized 8Bit BERT
Q8BERT: Quantized 8Bit BERT
Ofir Zafrir
Guy Boudoukh
Peter Izsak
Moshe Wasserblat
MQ
57
502
0
14 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
  lighter
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
134
7,437
0
02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
62
1,847
0
23 Sep 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
408
24,160
0
26 Jul 2019
Playing the lottery with rewards and multiple languages: lottery tickets
  in RL and NLP
Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP
Haonan Yu
Sergey Edunov
Yuandong Tian
Ari S. Morcos
49
148
0
06 Jun 2019
Efficient 8-Bit Quantization of Transformer Neural Machine Language
  Translation Model
Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model
Aishwarya Bhandare
Vamsi Sripathi
Deepthi Karkada
Vivek V. Menon
Sun Choi
Kushal Datta
V. Saletore
MQ
53
132
0
03 Jun 2019
Are Sixteen Heads Really Better than One?
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
79
1,051
0
25 May 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy
  Lifting, the Rest Can Be Pruned
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
76
1,120
0
23 May 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language
  Understanding Systems
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
183
2,296
0
02 May 2019
Efficient Attention: Attention with Linear Complexities
Efficient Attention: Attention with Linear Complexities
Zhuoran Shen
Mingyuan Zhang
Haiyu Zhao
Shuai Yi
Hongsheng Li
83
519
0
04 Dec 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
966
93,936
0
11 Oct 2018
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Jonathan Frankle
Michael Carbin
170
3,433
0
09 Mar 2018
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
453
129,831
0
12 Jun 2017
Sharp Models on Dull Hardware: Fast and Accurate Neural Machine
  Translation Decoding on the CPU
Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU
Jacob Devlin
49
36
0
04 May 2017
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
241
19,523
0
09 Mar 2015
1