On the Performance and Memory Footprint of Distributed Training: An Empirical Study on Transformers

2 July 2024

Tao Li

Papers citing "On the Performance and Memory Footprint of Distributed Training: An Empirical Study on Transformers"

2 / 2 papers shown

Title
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism M. Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 243 1,815 0 17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 294 6,943 0 20 Apr 2018