Don't Use Large Mini-Batches, Use Local SGD

22 August 2018

Papers citing "Don't Use Large Mini-Batches, Use Local SGD"

21 / 271 papers shown

Title
On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization Hao Yu R. L. Jin Sen Yang FedML 27 378 0 09 May 2019
Communication trade-offs for synchronized distributed SGD with large step size Kumar Kshitij Patel Aymeric Dieuleveut FedML 17 27 0 25 Apr 2019
CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks Peng Li Susie Xi Rao Jennifer Blase Yue Zhang Xu Chu Ce Zhang 6 41 0 20 Apr 2019
A Distributed Hierarchical SGD Algorithm with Sparse Global Reduction Fan Zhou Guojing Cong 9 8 0 12 Mar 2019
Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air Mohammad Mohammadi Amiri Deniz Gunduz 17 53 0 03 Jan 2019
Federated Optimization in Heterogeneous Networks Tian Li Anit Kumar Sahu Manzil Zaheer Maziar Sanjabi Ameet Talwalkar Virginia Smith FedML 11 5,011 0 14 Dec 2018
Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks Kazuki Osawa Yohei Tsuji Yuichiro Ueno Akira Naruse Rio Yokota Satoshi Matsuoka ODL 28 95 0 29 Nov 2018
Measuring the Effects of Data Parallelism on Neural Network Training Christopher J. Shallue Jaehoon Lee J. Antognini J. Mamou J. Ketterling Yao Wang 33 407 0 08 Nov 2018
Elastic CoCoA: Scaling In to Improve Convergence Michael Kaufmann Chia-Wen Cheng K. Kourtis 9 3 0 06 Nov 2018
Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD Jianyu Wang Gauri Joshi FedML 17 231 0 19 Oct 2018
Distributed Learning over Unreliable Networks Chen Yu Hanlin Tang Cédric Renggli S. Kassing Ankit Singla Dan Alistarh Ce Zhang Ji Liu OOD 9 59 0 17 Oct 2018
Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation Nikolay Bogoychev Marcin Junczys-Dowmunt Kenneth Heafield Alham Fikri Aji ODL 11 17 0 27 Aug 2018
Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms Jianyu Wang Gauri Joshi 11 348 0 22 Aug 2018
Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning Hao Yu Sen Yang Shenghuo Zhu MoMe FedML 22 594 0 17 Jul 2018
Local SGD Converges Fast and Communicates Little Sebastian U. Stich FedML 26 1,042 0 24 May 2018
Communication Compression for Decentralized Training Hanlin Tang Shaoduo Gan Ce Zhang Tong Zhang Ji Liu 11 270 0 17 Mar 2018
Federated Meta-Learning with Fast Convergence and Efficient Communication Fei Chen Mi Luo Zhenhua Dong Zhenguo Li Xiuqiang He FedML 27 388 0 22 Feb 2018
Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability J. Keuper Franz-Josef Pfreundt GNN 47 97 0 22 Sep 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 278 2,888 0 15 Sep 2016
A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method Simon Lacoste-Julien Mark W. Schmidt Francis R. Bach 119 259 0 10 Dec 2012
Optimal Distributed Online Prediction using Mini-Batches O. Dekel Ran Gilad-Bachrach Ohad Shamir Lin Xiao 168 683 0 07 Dec 2010