ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1811.05233
  4. Cited By
Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash
v1v2 (latest)

Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash

13 November 2018
Hiroaki Mikami
Hisahiro Suganuma
Pongsakorn U-chupala
Yoshiki Tanaka
Yuichi Kageyama
ArXiv (abs)PDFHTML

Papers citing "Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash"

26 / 26 papers shown
Title
SAGIPS: A Scalable Asynchronous Generative Inverse Problem Solver
SAGIPS: A Scalable Asynchronous Generative Inverse Problem Solver
Daniel Lersch
Malachi Schram
Zhenyu Dai
Kishansingh Rajput
Xingfu Wu
Nobuo Sato
J. T. Childers
67
0
0
11 Jun 2024
High Throughput Training of Deep Surrogates from Large Ensemble Runs
High Throughput Training of Deep Surrogates from Large Ensemble Runs
Lucas Meyer
M. Schouler
R. Caulk
Alejandro Ribés
Bruno Raffin
AI4CE
42
6
0
28 Sep 2023
ABS: Adaptive Bounded Staleness Converges Faster and Communicates Less
ABS: Adaptive Bounded Staleness Converges Faster and Communicates Less
Qiao Tan
Feng Zhu
Jingjing Zhang
57
0
0
21 Jan 2023
Scaling the Wild: Decentralizing Hogwild!-style Shared-memory SGD
Scaling the Wild: Decentralizing Hogwild!-style Shared-memory SGD
Bapi Chatterjee
Vyacheslav Kungurtsev
Dan Alistarh
FedML
51
2
0
13 Mar 2022
Themis: A Network Bandwidth-Aware Collective Scheduling Policy for
  Distributed Training of DL Models
Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models
Saeed Rashidi
William Won
Sudarshan Srinivasan
Srinivas Sridharan
T. Krishna
GNN
88
34
0
09 Oct 2021
Oscars: Adaptive Semi-Synchronous Parallel Model for Distributed Deep
  Learning with Global View
Oscars: Adaptive Semi-Synchronous Parallel Model for Distributed Deep Learning with Global View
Sheng-Jun Huang
30
0
0
17 Feb 2021
GradPIM: A Practical Processing-in-DRAM Architecture for Gradient
  Descent
GradPIM: A Practical Processing-in-DRAM Architecture for Gradient Descent
Heesu Kim
Hanmin Park
Taehyun Kim
Kwanheum Cho
Eojin Lee
Soojung Ryu
Hyuk-Jae Lee
Kiyoung Choi
Jinho Lee
66
36
0
15 Feb 2021
A Comprehensive Survey on Hardware-Aware Neural Architecture Search
A Comprehensive Survey on Hardware-Aware Neural Architecture Search
Hadjer Benmeziane
Kaoutar El Maghraoui
Hamza Ouarnoughi
Smail Niar
Martin Wistuba
Naigang Wang
118
108
0
22 Jan 2021
Study on the Large Batch Size Training of Neural Networks Based on the
  Second Order Gradient
Study on the Large Batch Size Training of Neural Networks Based on the Second Order Gradient
Fengli Gao
Huicai Zhong
ODL
30
10
0
16 Dec 2020
Enabling Compute-Communication Overlap in Distributed Deep Learning
  Training Platforms
Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms
Saeed Rashidi
Matthew Denton
Srinivas Sridharan
Sudarshan Srinivasan
Amoghavarsha Suresh
Jade Nie
T. Krishna
104
49
0
30 Jun 2020
The Limit of the Batch Size
The Limit of the Batch Size
Yang You
Yuhui Wang
Huan Zhang
Zhao-jie Zhang
J. Demmel
Cho-Jui Hsieh
121
15
0
15 Jun 2020
Optimizing Deep Learning Recommender Systems' Training On CPU Cluster
  Architectures
Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Dhiraj D. Kalamkar
E. Georganas
Sudarshan Srinivasan
Jianping Chen
Mikhail Shiryaev
A. Heinecke
93
48
0
10 May 2020
Communication optimization strategies for distributed deep neural
  network training: A survey
Communication optimization strategies for distributed deep neural network training: A survey
Shuo Ouyang
Dezun Dong
Yemao Xu
Liquan Xiao
116
12
0
06 Mar 2020
Large Batch Training Does Not Need Warmup
Large Batch Training Does Not Need Warmup
Zhouyuan Huo
Bin Gu
Heng-Chiao Huang
AI4CEODL
47
5
0
04 Feb 2020
Accelerating Data Loading in Deep Neural Network Training
Accelerating Data Loading in Deep Neural Network Training
Chih-Chieh Yang
Guojing Cong
74
38
0
02 Oct 2019
Gap Aware Mitigation of Gradient Staleness
Gap Aware Mitigation of Gradient Staleness
Saar Barkai
Ido Hakimi
Assaf Schuster
89
23
0
24 Sep 2019
Taming Momentum in a Distributed Asynchronous Environment
Taming Momentum in a Distributed Asynchronous Environment
Ido Hakimi
Saar Barkai
Moshe Gabel
Assaf Schuster
93
23
0
26 Jul 2019
Etalumis: Bringing Probabilistic Programming to Scientific Simulators at
  Scale
Etalumis: Bringing Probabilistic Programming to Scientific Simulators at Scale
A. G. Baydin
Lei Shao
W. Bhimji
Lukas Heinrich
Lawrence Meadows
...
Philip Torr
Victor W. Lee
Kyle Cranmer
P. Prabhat
Frank Wood
73
58
0
08 Jul 2019
Database Meets Deep Learning: Challenges and Opportunities
Database Meets Deep Learning: Challenges and Opportunities
Wei Wang
Meihui Zhang
Gang Chen
H. V. Jagadish
Beng Chin Ooi
K. Tan
82
148
0
21 Jun 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
294
1,000
0
01 Apr 2019
Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and
  GPUDirect
Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect
Ang Li
Shuaiwen Leon Song
Jieyang Chen
Jiajia Li
Xu Liu
Nathan R. Tallent
Kevin J. Barker
GNN
106
220
0
11 Mar 2019
Optimizing Network Performance for Distributed DNN Training on GPU
  Clusters: ImageNet/AlexNet Training in 1.5 Minutes
Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes
Peng Sun
Wansen Feng
Ruobing Han
Shengen Yan
Yonggang Wen
AI4CE
100
70
0
19 Feb 2019
Augment your batch: better training with larger batches
Augment your batch: better training with larger batches
Elad Hoffer
Tal Ben-Nun
Itay Hubara
Niv Giladi
Torsten Hoefler
Daniel Soudry
ODL
118
76
0
27 Jan 2019
Large-Scale Distributed Second-Order Optimization Using
  Kronecker-Factored Approximate Curvature for Deep Convolutional Neural
  Networks
Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks
Kazuki Osawa
Yohei Tsuji
Yuichiro Ueno
Akira Naruse
Rio Yokota
Satoshi Matsuoka
ODL
107
95
0
29 Nov 2018
Large batch size training of neural networks with adversarial training
  and second-order information
Large batch size training of neural networks with adversarial training and second-order information
Z. Yao
A. Gholami
Daiyaan Arfeen
Richard Liaw
Joseph E. Gonzalez
Kurt Keutzer
Michael W. Mahoney
ODL
96
42
0
02 Oct 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth
  Concurrency Analysis
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
Tal Ben-Nun
Torsten Hoefler
GNN
75
709
0
26 Feb 2018
1