Scaling Distributed Training with Adaptive Summation

4 June 2020

Papers citing "Scaling Distributed Training with Adaptive Summation"

4 / 4 papers shown

Title
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression Jaeyong Song Jinkyu Yim Jaewon Jung Hongsun Jang H. Kim Youngsok Kim Jinho Lee GNN 8 25 0 24 Jan 2023
BFTrainer: Low-Cost Training of Neural Networks on Unfillable Supercomputer Nodes Zhengchun Liu R. Kettimuthu M. Papka Ian T. Foster 21 3 0 22 Jun 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism M. Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 243 1,817 0 17 Sep 2019
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 273 2,886 0 15 Sep 2016