Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.02081
Cited By
On the Performance and Memory Footprint of Distributed Training: An Empirical Study on Transformers
2 July 2024
Zhengxian Lu
Fangyu Wang
Zhiwei Xu
Fei Yang
Tao Li
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Performance and Memory Footprint of Distributed Training: An Empirical Study on Transformers"
2 / 2 papers shown
Title
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,815
0
17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
1