Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1808.07217
Cited By
Don't Use Large Mini-Batches, Use Local SGD
22 August 2018
Tao R. Lin
Sebastian U. Stich
Kumar Kshitij Patel
Martin Jaggi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Don't Use Large Mini-Batches, Use Local SGD"
21 / 271 papers shown
Title
On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization
Hao Yu
R. L. Jin
Sen Yang
FedML
27
378
0
09 May 2019
Communication trade-offs for synchronized distributed SGD with large step size
Kumar Kshitij Patel
Aymeric Dieuleveut
FedML
17
27
0
25 Apr 2019
CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks
Peng Li
Susie Xi Rao
Jennifer Blase
Yue Zhang
Xu Chu
Ce Zhang
6
41
0
20 Apr 2019
A Distributed Hierarchical SGD Algorithm with Sparse Global Reduction
Fan Zhou
Guojing Cong
9
8
0
12 Mar 2019
Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air
Mohammad Mohammadi Amiri
Deniz Gunduz
17
53
0
03 Jan 2019
Federated Optimization in Heterogeneous Networks
Tian Li
Anit Kumar Sahu
Manzil Zaheer
Maziar Sanjabi
Ameet Talwalkar
Virginia Smith
FedML
11
5,011
0
14 Dec 2018
Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks
Kazuki Osawa
Yohei Tsuji
Yuichiro Ueno
Akira Naruse
Rio Yokota
Satoshi Matsuoka
ODL
28
95
0
29 Nov 2018
Measuring the Effects of Data Parallelism on Neural Network Training
Christopher J. Shallue
Jaehoon Lee
J. Antognini
J. Mamou
J. Ketterling
Yao Wang
33
407
0
08 Nov 2018
Elastic CoCoA: Scaling In to Improve Convergence
Michael Kaufmann
Chia-Wen Cheng
K. Kourtis
9
3
0
06 Nov 2018
Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD
Jianyu Wang
Gauri Joshi
FedML
17
231
0
19 Oct 2018
Distributed Learning over Unreliable Networks
Chen Yu
Hanlin Tang
Cédric Renggli
S. Kassing
Ankit Singla
Dan Alistarh
Ce Zhang
Ji Liu
OOD
9
59
0
17 Oct 2018
Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation
Nikolay Bogoychev
Marcin Junczys-Dowmunt
Kenneth Heafield
Alham Fikri Aji
ODL
11
17
0
27 Aug 2018
Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms
Jianyu Wang
Gauri Joshi
11
348
0
22 Aug 2018
Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning
Hao Yu
Sen Yang
Shenghuo Zhu
MoMe
FedML
22
594
0
17 Jul 2018
Local SGD Converges Fast and Communicates Little
Sebastian U. Stich
FedML
26
1,042
0
24 May 2018
Communication Compression for Decentralized Training
Hanlin Tang
Shaoduo Gan
Ce Zhang
Tong Zhang
Ji Liu
11
270
0
17 Mar 2018
Federated Meta-Learning with Fast Convergence and Efficient Communication
Fei Chen
Mi Luo
Zhenhua Dong
Zhenguo Li
Xiuqiang He
FedML
27
388
0
22 Feb 2018
Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability
J. Keuper
Franz-Josef Pfreundt
GNN
47
97
0
22 Sep 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
278
2,888
0
15 Sep 2016
A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method
Simon Lacoste-Julien
Mark W. Schmidt
Francis R. Bach
119
259
0
10 Dec 2012
Optimal Distributed Online Prediction using Mini-Batches
O. Dekel
Ran Gilad-Bachrach
Ohad Shamir
Lin Xiao
168
683
0
07 Dec 2010
Previous
1
2
3
4
5
6