Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1811.05233
Cited By
v1
v2 (latest)
Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash
13 November 2018
Hiroaki Mikami
Hisahiro Suganuma
Pongsakorn U-chupala
Yoshiki Tanaka
Yuichi Kageyama
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash"
26 / 26 papers shown
Title
SAGIPS: A Scalable Asynchronous Generative Inverse Problem Solver
Daniel Lersch
Malachi Schram
Zhenyu Dai
Kishansingh Rajput
Xingfu Wu
Nobuo Sato
J. T. Childers
67
0
0
11 Jun 2024
High Throughput Training of Deep Surrogates from Large Ensemble Runs
Lucas Meyer
M. Schouler
R. Caulk
Alejandro Ribés
Bruno Raffin
AI4CE
42
6
0
28 Sep 2023
ABS: Adaptive Bounded Staleness Converges Faster and Communicates Less
Qiao Tan
Feng Zhu
Jingjing Zhang
57
0
0
21 Jan 2023
Scaling the Wild: Decentralizing Hogwild!-style Shared-memory SGD
Bapi Chatterjee
Vyacheslav Kungurtsev
Dan Alistarh
FedML
51
2
0
13 Mar 2022
Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models
Saeed Rashidi
William Won
Sudarshan Srinivasan
Srinivas Sridharan
T. Krishna
GNN
88
34
0
09 Oct 2021
Oscars: Adaptive Semi-Synchronous Parallel Model for Distributed Deep Learning with Global View
Sheng-Jun Huang
30
0
0
17 Feb 2021
GradPIM: A Practical Processing-in-DRAM Architecture for Gradient Descent
Heesu Kim
Hanmin Park
Taehyun Kim
Kwanheum Cho
Eojin Lee
Soojung Ryu
Hyuk-Jae Lee
Kiyoung Choi
Jinho Lee
66
36
0
15 Feb 2021
A Comprehensive Survey on Hardware-Aware Neural Architecture Search
Hadjer Benmeziane
Kaoutar El Maghraoui
Hamza Ouarnoughi
Smail Niar
Martin Wistuba
Naigang Wang
118
108
0
22 Jan 2021
Study on the Large Batch Size Training of Neural Networks Based on the Second Order Gradient
Fengli Gao
Huicai Zhong
ODL
30
10
0
16 Dec 2020
Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms
Saeed Rashidi
Matthew Denton
Srinivas Sridharan
Sudarshan Srinivasan
Amoghavarsha Suresh
Jade Nie
T. Krishna
104
49
0
30 Jun 2020
The Limit of the Batch Size
Yang You
Yuhui Wang
Huan Zhang
Zhao-jie Zhang
J. Demmel
Cho-Jui Hsieh
121
15
0
15 Jun 2020
Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Dhiraj D. Kalamkar
E. Georganas
Sudarshan Srinivasan
Jianping Chen
Mikhail Shiryaev
A. Heinecke
93
48
0
10 May 2020
Communication optimization strategies for distributed deep neural network training: A survey
Shuo Ouyang
Dezun Dong
Yemao Xu
Liquan Xiao
116
12
0
06 Mar 2020
Large Batch Training Does Not Need Warmup
Zhouyuan Huo
Bin Gu
Heng-Chiao Huang
AI4CE
ODL
47
5
0
04 Feb 2020
Accelerating Data Loading in Deep Neural Network Training
Chih-Chieh Yang
Guojing Cong
74
38
0
02 Oct 2019
Gap Aware Mitigation of Gradient Staleness
Saar Barkai
Ido Hakimi
Assaf Schuster
89
23
0
24 Sep 2019
Taming Momentum in a Distributed Asynchronous Environment
Ido Hakimi
Saar Barkai
Moshe Gabel
Assaf Schuster
93
23
0
26 Jul 2019
Etalumis: Bringing Probabilistic Programming to Scientific Simulators at Scale
A. G. Baydin
Lei Shao
W. Bhimji
Lukas Heinrich
Lawrence Meadows
...
Philip Torr
Victor W. Lee
Kyle Cranmer
P. Prabhat
Frank Wood
73
58
0
08 Jul 2019
Database Meets Deep Learning: Challenges and Opportunities
Wei Wang
Meihui Zhang
Gang Chen
H. V. Jagadish
Beng Chin Ooi
K. Tan
82
148
0
21 Jun 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
294
1,000
0
01 Apr 2019
Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect
Ang Li
Shuaiwen Leon Song
Jieyang Chen
Jiajia Li
Xu Liu
Nathan R. Tallent
Kevin J. Barker
GNN
106
220
0
11 Mar 2019
Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes
Peng Sun
Wansen Feng
Ruobing Han
Shengen Yan
Yonggang Wen
AI4CE
100
70
0
19 Feb 2019
Augment your batch: better training with larger batches
Elad Hoffer
Tal Ben-Nun
Itay Hubara
Niv Giladi
Torsten Hoefler
Daniel Soudry
ODL
118
76
0
27 Jan 2019
Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks
Kazuki Osawa
Yohei Tsuji
Yuichiro Ueno
Akira Naruse
Rio Yokota
Satoshi Matsuoka
ODL
107
95
0
29 Nov 2018
Large batch size training of neural networks with adversarial training and second-order information
Z. Yao
A. Gholami
Daiyaan Arfeen
Richard Liaw
Joseph E. Gonzalez
Kurt Keutzer
Michael W. Mahoney
ODL
96
42
0
02 Oct 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
Tal Ben-Nun
Torsten Hoefler
GNN
75
709
0
26 Feb 2018
1