DropCompute: simple and more robust distributed synchronous training via compute variance reduction

18 June 2023

Papers citing "DropCompute: simple and more robust distributed synchronous training via compute variance reduction"

6 / 6 papers shown

Title
Understanding Stragglers in Large Model Training Using What-if Analysis Jinkun Lin Ziheng Jiang Zuquan Song Sida Zhao Menghan Yu ... Shuguang Wang Haibin Lin Xin Liu Aurojit Panda Jinyang Li 20 0 0 09 May 2025
From promise to practice: realizing high-performance decentralized training Zesen Wang Jiaojiao Zhang Xuyang Wu M. Johansson 13 0 0 15 Oct 2024
Stochastic Training is Not Necessary for Generalization Jonas Geiping Micah Goldblum Phillip E. Pope Michael Moeller Tom Goldstein 81 72 0 29 Sep 2021
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training Chia-Yu Chen Jiamin Ni Songtao Lu Xiaodong Cui Pin-Yu Chen ... Naigang Wang Swagath Venkataramani Vijayalakshmi Srinivasan Wei Zhang K. Gopalakrishnan 27 65 0 21 Apr 2021
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 226 4,424 0 23 Jan 2020
Optimal Distributed Online Prediction using Mini-Batches O. Dekel Ran Gilad-Bachrach Ohad Shamir Lin Xiao 164 684 0 07 Dec 2010