Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.03888
Cited By
M6-10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion Parameter Pretraining
8 October 2021
Junyang Lin
An Yang
Jinze Bai
Chang Zhou
Le Jiang
Xianyan Jia
Ang Wang
Jie M. Zhang
Yong Li
Wei Lin
Jingren Zhou
Hongxia Yang
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"M6-10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion Parameter Pretraining"
5 / 5 papers shown
Title
Carbon Emissions and Large Neural Network Training
David A. Patterson
Joseph E. Gonzalez
Quoc V. Le
Chen Liang
Lluís-Miquel Munguía
D. Rothchild
David R. So
Maud Texier
J. Dean
AI4CE
223
486
0
21 Apr 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
250
3,694
0
24 Feb 2021
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
137
299
0
18 Jan 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
215
2,998
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
237
1,412
0
17 Sep 2019
1