FastFormers: Highly Efficient Transformer Models for Natural Language Understanding

26 October 2020

Young Jin Kim

Papers citing "FastFormers: Highly Efficient Transformer Models for Natural Language Understanding"

27 / 27 papers shown

Title
On Multilingual Encoder Language Model Compression for Low-Resource Languages Daniil Gurgurov Michal Gregor Josef van Genabith Simon Ostermann 96 0 0 22 May 2025
The Unreasonable Ineffectiveness of the Deeper Layers Andrey Gromov Kushal Tirumala Hassan Shapourian Paolo Glorioso Daniel A. Roberts 78 93 0 26 Mar 2024
Linformer: Self-Attention with Linear Complexity Sinong Wang Belinda Z. Li Madian Khabsa Han Fang Hao Ma 170 1,678 0 08 Jun 2020
BERT Loses Patience: Fast and Robust Inference with Early Exit Wangchunshu Zhou Canwen Xu Tao Ge Julian McAuley Ke Xu Furu Wei 36 337 0 07 Jun 2020
Movement Pruning: Adaptive Sparsity by Fine-Tuning Victor Sanh Thomas Wolf Alexander M. Rush 57 475 0 15 May 2020
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference Ji Xin Raphael Tang Jaejun Lee Yaoliang Yu Jimmy J. Lin 42 370 0 27 Apr 2020
The Right Tool for the Job: Matching Model and Instance Complexities Roy Schwartz Gabriel Stanovsky Swabha Swayamdipta Jesse Dodge Noah A. Smith 73 168 0 16 Apr 2020
Training with Quantization Noise for Extreme Model Compression Angela Fan Pierre Stock Benjamin Graham Edouard Grave Remi Gribonval Hervé Jégou Armand Joulin MQ 57 244 0 15 Apr 2020
DynaBERT: Dynamic BERT with Adaptive Width and Depth Lu Hou Zhiqi Huang Lifeng Shang Xin Jiang Xiao Chen Qun Liu MQ 65 322 0 08 Apr 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time Weijie Liu Peng Zhou Zhe Zhao Zhiruo Wang Haotang Deng Qi Ju 75 356 0 05 Apr 2020
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning Mitchell A. Gordon Kevin Duh Nicholas Andrews VLM 37 339 0 19 Feb 2020
Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning Peter Henderson Jie Hu Joshua Romoff Emma Brunskill Dan Jurafsky Joelle Pineau 68 445 0 31 Jan 2020
Q8BERT: Quantized 8Bit BERT Ofir Zafrir Guy Boudoukh Peter Izsak Moshe Wasserblat MQ 57 502 0 14 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Victor Sanh Lysandre Debut Julien Chaumond Thomas Wolf 134 7,437 0 02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding Xiaoqi Jiao Yichun Yin Lifeng Shang Xin Jiang Xiao Chen Linlin Li F. Wang Qun Liu VLM 62 1,847 0 23 Sep 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy M. Lewis Luke Zettlemoyer Veselin Stoyanov AIMat 408 24,160 0 26 Jul 2019
Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP Haonan Yu Sergey Edunov Yuandong Tian Ari S. Morcos 49 148 0 06 Jun 2019
Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model Aishwarya Bhandare Vamsi Sripathi Deepthi Karkada Vivek V. Menon Sun Choi Kushal Datta V. Saletore MQ 53 132 0 03 Jun 2019
Are Sixteen Heads Really Better than One? Paul Michel Omer Levy Graham Neubig MoE 79 1,051 0 25 May 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned Elena Voita David Talbot F. Moiseev Rico Sennrich Ivan Titov 76 1,120 0 23 May 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems Alex Jinpeng Wang Yada Pruksachatkun Nikita Nangia Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 183 2,296 0 02 May 2019
Efficient Attention: Attention with Linear Complexities Zhuoran Shen Mingyuan Zhang Haiyu Zhao Shuai Yi Hongsheng Li 83 519 0 04 Dec 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 966 93,936 0 11 Oct 2018
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks Jonathan Frankle Michael Carbin 170 3,433 0 09 Mar 2018
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 453 129,831 0 12 Jun 2017
Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU Jacob Devlin 49 36 0 04 May 2017
Distilling the Knowledge in a Neural Network Geoffrey E. Hinton Oriol Vinyals J. Dean FedML 241 19,523 0 09 Mar 2015