Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2109.10686
Cited By
Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
22 September 2021
Yi Tay
Mostafa Dehghani
J. Rao
W. Fedus
Samira Abnar
Hyung Won Chung
Sharan Narang
Dani Yogatama
Ashish Vaswani
Donald Metzler
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers"
6 / 6 papers shown
Title
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
225
2,132
0
04 May 2021
Carbon Emissions and Large Neural Network Training
David A. Patterson
Joseph E. Gonzalez
Quoc V. Le
Chen Liang
Lluís-Miquel Munguía
D. Rothchild
David R. So
Maud Texier
J. Dean
AI4CE
223
491
0
21 Apr 2021
Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with
1
/
n
1/n
1/
n
Parameters
Aston Zhang
Yi Tay
Shuai Zhang
Alvin Chan
A. Luu
S. Hui
Jie Fu
MQ
148
77
0
17 Feb 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
215
3,054
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
226
1,436
0
17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
275
6,003
0
20 Apr 2018
1