ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.07217
  4. Cited By
Don't Use Large Mini-Batches, Use Local SGD

Don't Use Large Mini-Batches, Use Local SGD

22 August 2018
Tao R. Lin
Sebastian U. Stich
Kumar Kshitij Patel
Martin Jaggi
ArXivPDFHTML

Papers citing "Don't Use Large Mini-Batches, Use Local SGD"

21 / 271 papers shown
Title
On the Linear Speedup Analysis of Communication Efficient Momentum SGD
  for Distributed Non-Convex Optimization
On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization
Hao Yu
R. L. Jin
Sen Yang
FedML
27
378
0
09 May 2019
Communication trade-offs for synchronized distributed SGD with large
  step size
Communication trade-offs for synchronized distributed SGD with large step size
Kumar Kshitij Patel
Aymeric Dieuleveut
FedML
17
27
0
25 Apr 2019
CleanML: A Study for Evaluating the Impact of Data Cleaning on ML
  Classification Tasks
CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks
Peng Li
Susie Xi Rao
Jennifer Blase
Yue Zhang
Xu Chu
Ce Zhang
6
41
0
20 Apr 2019
A Distributed Hierarchical SGD Algorithm with Sparse Global Reduction
A Distributed Hierarchical SGD Algorithm with Sparse Global Reduction
Fan Zhou
Guojing Cong
9
8
0
12 Mar 2019
Machine Learning at the Wireless Edge: Distributed Stochastic Gradient
  Descent Over-the-Air
Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air
Mohammad Mohammadi Amiri
Deniz Gunduz
17
53
0
03 Jan 2019
Federated Optimization in Heterogeneous Networks
Federated Optimization in Heterogeneous Networks
Tian Li
Anit Kumar Sahu
Manzil Zaheer
Maziar Sanjabi
Ameet Talwalkar
Virginia Smith
FedML
11
5,011
0
14 Dec 2018
Large-Scale Distributed Second-Order Optimization Using
  Kronecker-Factored Approximate Curvature for Deep Convolutional Neural
  Networks
Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks
Kazuki Osawa
Yohei Tsuji
Yuichiro Ueno
Akira Naruse
Rio Yokota
Satoshi Matsuoka
ODL
28
95
0
29 Nov 2018
Measuring the Effects of Data Parallelism on Neural Network Training
Measuring the Effects of Data Parallelism on Neural Network Training
Christopher J. Shallue
Jaehoon Lee
J. Antognini
J. Mamou
J. Ketterling
Yao Wang
33
407
0
08 Nov 2018
Elastic CoCoA: Scaling In to Improve Convergence
Elastic CoCoA: Scaling In to Improve Convergence
Michael Kaufmann
Chia-Wen Cheng
K. Kourtis
9
3
0
06 Nov 2018
Adaptive Communication Strategies to Achieve the Best Error-Runtime
  Trade-off in Local-Update SGD
Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD
Jianyu Wang
Gauri Joshi
FedML
17
231
0
19 Oct 2018
Distributed Learning over Unreliable Networks
Distributed Learning over Unreliable Networks
Chen Yu
Hanlin Tang
Cédric Renggli
S. Kassing
Ankit Singla
Dan Alistarh
Ce Zhang
Ji Liu
OOD
9
59
0
17 Oct 2018
Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine
  Translation
Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation
Nikolay Bogoychev
Marcin Junczys-Dowmunt
Kenneth Heafield
Alham Fikri Aji
ODL
11
17
0
27 Aug 2018
Cooperative SGD: A unified Framework for the Design and Analysis of
  Communication-Efficient SGD Algorithms
Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms
Jianyu Wang
Gauri Joshi
11
348
0
22 Aug 2018
Parallel Restarted SGD with Faster Convergence and Less Communication:
  Demystifying Why Model Averaging Works for Deep Learning
Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning
Hao Yu
Sen Yang
Shenghuo Zhu
MoMe
FedML
22
594
0
17 Jul 2018
Local SGD Converges Fast and Communicates Little
Local SGD Converges Fast and Communicates Little
Sebastian U. Stich
FedML
26
1,042
0
24 May 2018
Communication Compression for Decentralized Training
Communication Compression for Decentralized Training
Hanlin Tang
Shaoduo Gan
Ce Zhang
Tong Zhang
Ji Liu
11
270
0
17 Mar 2018
Federated Meta-Learning with Fast Convergence and Efficient
  Communication
Federated Meta-Learning with Fast Convergence and Efficient Communication
Fei Chen
Mi Luo
Zhenhua Dong
Zhenguo Li
Xiuqiang He
FedML
27
388
0
22 Feb 2018
Distributed Training of Deep Neural Networks: Theoretical and Practical
  Limits of Parallel Scalability
Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability
J. Keuper
Franz-Josef Pfreundt
GNN
47
97
0
22 Sep 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
278
2,888
0
15 Sep 2016
A simpler approach to obtaining an O(1/t) convergence rate for the
  projected stochastic subgradient method
A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method
Simon Lacoste-Julien
Mark W. Schmidt
Francis R. Bach
119
259
0
10 Dec 2012
Optimal Distributed Online Prediction using Mini-Batches
Optimal Distributed Online Prediction using Mini-Batches
O. Dekel
Ran Gilad-Bachrach
Ohad Shamir
Lin Xiao
168
683
0
07 Dec 2010
Previous
123456