Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1606.07365
Cited By
Parallel SGD: When does averaging help?
23 June 2016
Jian Zhang
Christopher De Sa
Ioannis Mitliagkas
Christopher Ré
MoMe
FedML
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Parallel SGD: When does averaging help?"
9 / 9 papers shown
Title
EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
Jialiang Cheng
Ning Gao
Yun Yue
Zhiling Ye
Jiadi Jiang
Jian Sha
OffRL
150
1
0
10 Dec 2024
Communication optimization strategies for distributed deep neural network training: A survey
Shuo Ouyang
Dezun Dong
Yemao Xu
Liquan Xiao
116
12
0
06 Mar 2020
Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization
Xiangru Lian
Yijun Huang
Y. Li
Ji Liu
149
499
0
27 Jun 2015
Splash: User-friendly Programming Interface for Parallelizing Stochastic Algorithms
Yuchen Zhang
Michael I. Jordan
82
20
0
24 Jun 2015
Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems
Christopher De Sa
K. Olukotun
Christopher Ré
104
150
0
05 Nov 2014
DimmWitted: A Study of Main-Memory Statistical Analytics
Ce Zhang
Christopher Ré
171
146
0
28 Mar 2014
HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
Feng Niu
Benjamin Recht
Christopher Ré
Stephen J. Wright
216
2,274
0
28 Jun 2011
Distributed Delayed Stochastic Optimization
Alekh Agarwal
John C. Duchi
145
627
0
28 Apr 2011
Optimal Distributed Online Prediction using Mini-Batches
O. Dekel
Ran Gilad-Bachrach
Ohad Shamir
Lin Xiao
287
685
0
07 Dec 2010
1