Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2007.05105
Cited By
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
9 July 2020
Tyler B. Johnson
Pulkit Agrawal
Haijie Gu
Carlos Guestrin
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AdaScale SGD: A User-Friendly Algorithm for Distributed Training"
20 / 20 papers shown
Title
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
112
0
0
30 Dec 2024
A Method for Enhancing Generalization of Adam by Multiple Integrations
Long Jin
Han Nong
Liangming Chen
Zhenming Su
63
0
0
17 Dec 2024
Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
38
1
0
20 Jun 2024
AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods
Tim Tsz-Kit Lau
Han Liu
Mladen Kolar
ODL
24
6
0
17 Feb 2024
Flexible Communication for Optimal Distributed Learning over Unpredictable Networks
S. Tyagi
Martin Swany
37
1
0
05 Dec 2023
An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent
Zhao-quan Song
Chiwun Yang
19
9
0
17 Oct 2023
Accelerating Distributed ML Training via Selective Synchronization
S. Tyagi
Martin Swany
FedML
24
3
0
16 Jul 2023
GraVAC: Adaptive Compression for Communication-Efficient Distributed DL Training
S. Tyagi
Martin Swany
17
4
0
20 May 2023
Scavenger: A Cloud Service for Optimizing Cost and Performance of ML Training
S. Tyagi
Prateek Sharma
16
5
0
12 Mar 2023
KHAN: Knowledge-Aware Hierarchical Attention Networks for Accurate Political Stance Prediction
Yunyong Ko
Seongeun Ryu
Soeun Han
Youngseung Jeon
Jaehoon Kim
Sohyun Park
Kyungsik Han
Hanghang Tong
Sang-Wook Kim
13
15
0
23 Feb 2023
FedExP: Speeding Up Federated Averaging via Extrapolation
Divyansh Jhunjhunwala
Shiqiang Wang
Gauri Joshi
FedML
11
51
0
23 Jan 2023
Large-batch Optimization for Dense Visual Predictions
Zeyue Xue
Jianming Liang
Guanglu Song
Zhuofan Zong
Liang Chen
Yu Liu
Ping Luo
VLM
23
9
0
20 Oct 2022
Adaptive Learning Rates for Faster Stochastic Gradient Methods
Samuel Horváth
Konstantin Mishchenko
Peter Richtárik
ODL
33
7
0
10 Aug 2022
Byzantine Fault Tolerance in Distributed Machine Learning : a Survey
Djamila Bouhata
Hamouma Moumen
Moumen Hamouma
Ahcène Bounceur
AI4CE
23
7
0
05 May 2022
ResIST: Layer-Wise Decomposition of ResNets for Distributed Training
Chen Dun
Cameron R. Wolfe
C. Jermaine
Anastasios Kyrillidis
16
21
0
02 Jul 2021
Consensus Control for Decentralized Deep Learning
Lingjing Kong
Tao R. Lin
Anastasia Koloskova
Martin Jaggi
Sebastian U. Stich
19
75
0
09 Feb 2021
Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters
S. Shi
Xianhao Zhou
Shutao Song
Xingyao Wang
Zilin Zhu
...
Chenyang Guo
Bo Yang
Zhibo Chen
Yongjian Wu
X. Chu
GNN
18
55
0
20 Oct 2020
A Closer Look at Codistillation for Distributed Training
Shagun Sodhani
Olivier Delalleau
Mahmoud Assran
Koustuv Sinha
Nicolas Ballas
Michael G. Rabbat
11
8
0
06 Oct 2020
Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers
Robin M. Schmidt
Frank Schneider
Philipp Hennig
ODL
32
144
0
03 Jul 2020
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
Hamed Karimi
J. Nutini
Mark W. Schmidt
119
1,198
0
16 Aug 2016
1