ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1807.11205
  4. Cited By
Highly Scalable Deep Learning Training System with Mixed-Precision:
  Training ImageNet in Four Minutes

Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes

30 July 2018
Xianyan Jia
Shutao Song
W. He
Yangzihao Wang
Haidong Rong
Feihu Zhou
Liqiang Xie
Zhenyu Guo
Yuanzhou Yang
Li Yu
Tiegang Chen
Guangxiao Hu
Shaoshuai Shi
Xiaowen Chu
ArXiv (abs)PDFHTML

Papers citing "Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes"

50 / 109 papers shown
Title
HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU
  Clusters through Integration of Pipelined Model Parallelism and Data
  Parallelism
HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism
Jay H. Park
Gyeongchan Yun
Chang Yi
N. T. Nguyen
Seungmin Lee
Jaesik Choi
S. Noh
Young-ri Choi
MoE
86
134
0
28 May 2020
OD-SGD: One-step Delay Stochastic Gradient Descent for Distributed
  Training
OD-SGD: One-step Delay Stochastic Gradient Descent for Distributed Training
Yemao Xu
Dezun Dong
Weixia Xu
Xiangke Liao
29
7
0
14 May 2020
Measuring the Algorithmic Efficiency of Neural Networks
Measuring the Algorithmic Efficiency of Neural Networks
Danny Hernandez
Tom B. Brown
289
97
0
08 May 2020
A Generalization of the Allreduce Operation
A Generalization of the Allreduce Operation
D. Kolmakov
Xuecang Zhang
20
5
0
20 Apr 2020
Learning 2-opt Heuristics for the Traveling Salesman Problem via Deep
  Reinforcement Learning
Learning 2-opt Heuristics for the Traveling Salesman Problem via Deep Reinforcement Learning
P. Costa
Jason Rhuggenaath
Yingqian Zhang
A. Akçay
97
143
0
03 Apr 2020
A unifying mutual information view of metric learning: cross-entropy vs.
  pairwise losses
A unifying mutual information view of metric learning: cross-entropy vs. pairwise losses
Malik Boudiaf
Jérôme Rony
Imtiaz Masud Ziko
Eric Granger
M. Pedersoli
Pablo Piantanida
Ismail Ben Ayed
SSL
103
160
0
19 Mar 2020
Convergence of Artificial Intelligence and High Performance Computing on
  NSF-supported Cyberinfrastructure
Convergence of Artificial Intelligence and High Performance Computing on NSF-supported Cyberinfrastructure
Eliu A. Huerta
Asad Khan
Edward Davis
Colleen Bushell
W. Gropp
...
S. Koric
William T. C. Kramer
Brendan McGinty
Kenton McHenry
Aaron Saxton
AI4CE
107
45
0
18 Mar 2020
Communication optimization strategies for distributed deep neural
  network training: A survey
Communication optimization strategies for distributed deep neural network training: A survey
Shuo Ouyang
Dezun Dong
Yemao Xu
Liquan Xiao
116
12
0
06 Mar 2020
Distributed Training of Deep Neural Network Acoustic Models for
  Automatic Speech Recognition
Distributed Training of Deep Neural Network Acoustic Models for Automatic Speech Recognition
Xiaodong Cui
Wei Zhang
Ulrich Finkler
G. Saon
M. Picheny
David S. Kung
39
19
0
24 Feb 2020
Communication Contention Aware Scheduling of Multiple Deep Learning
  Training Jobs
Communication Contention Aware Scheduling of Multiple Deep Learning Training Jobs
Qiang-qiang Wang
Shaoshuai Shi
Canhui Wang
Xiaowen Chu
70
13
0
24 Feb 2020
Communication-Efficient Decentralized Learning with Sparsification and
  Adaptive Peer Selection
Communication-Efficient Decentralized Learning with Sparsification and Adaptive Peer Selection
Zhenheng Tang
Shaoshuai Shi
Xiaowen Chu
FedML
62
58
0
22 Feb 2020
Scalable and Practical Natural Gradient for Large-Scale Deep Learning
Scalable and Practical Natural Gradient for Large-Scale Deep Learning
Kazuki Osawa
Yohei Tsuji
Yuichiro Ueno
Akira Naruse
Chuan-Sheng Foo
Rio Yokota
85
37
0
13 Feb 2020
HyperSched: Dynamic Resource Reallocation for Model Development on a
  Deadline
HyperSched: Dynamic Resource Reallocation for Model Development on a Deadline
Richard Liaw
Romil Bhardwaj
Lisa Dunlap
Yitian Zou
Joseph E. Gonzalez
Ion Stoica
Alexey Tumanov
74
45
0
08 Jan 2020
Stochastic Weight Averaging in Parallel: Large-Batch Training that
  Generalizes Well
Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well
Vipul Gupta
S. Serrano
D. DeCoste
MoMe
83
60
0
07 Jan 2020
Optimization for deep learning: theory and algorithms
Optimization for deep learning: theory and algorithms
Ruoyu Sun
ODL
137
169
0
19 Dec 2019
Understanding Top-k Sparsification in Distributed Deep Learning
Understanding Top-k Sparsification in Distributed Deep Learning
Shaoshuai Shi
Xiaowen Chu
Ka Chun Cheung
Simon See
233
101
0
20 Nov 2019
Layer-wise Adaptive Gradient Sparsification for Distributed Deep
  Learning with Convergence Guarantees
Layer-wise Adaptive Gradient Sparsification for Distributed Deep Learning with Convergence Guarantees
Shaoshuai Shi
Zhenheng Tang
Qiang-qiang Wang
Kaiyong Zhao
Xiaowen Chu
65
22
0
20 Nov 2019
Understanding the Disharmony between Weight Normalization Family and
  Weight Decay: $ε-$shifted $L_2$ Regularizer
Understanding the Disharmony between Weight Normalization Family and Weight Decay: ε−ε-ε−shifted L2L_2L2​ Regularizer
Li Xiang
Chen Shuo
Xia Yan
Yang Jian
59
2
0
14 Nov 2019
E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings
E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings
Yue Wang
Ziyu Jiang
Xiaohan Chen
Pengfei Xu
Yang Zhao
Yingyan Lin
Zhangyang Wang
MQ
104
83
0
29 Oct 2019
Highly-scalable, physics-informed GANs for learning solutions of
  stochastic PDEs
Highly-scalable, physics-informed GANs for learning solutions of stochastic PDEs
Liu Yang
Sean Treichler
Thorsten Kurth
Keno Fischer
D. Barajas-Solano
...
Valentin Churavy
A. Tartakovsky
Michael Houston
P. Prabhat
George Karniadakis
AI4CE
84
39
0
29 Oct 2019
Accelerating Data Loading in Deep Neural Network Training
Accelerating Data Loading in Deep Neural Network Training
Chih-Chieh Yang
Guojing Cong
66
38
0
02 Oct 2019
MLPerf Training Benchmark
MLPerf Training Benchmark
Arya D. McCarthy
Christine Cheng
Cody Coleman
Greg Diamos
Paulius Micikevicius
...
Carole-Jean Wu
Lingjie Xu
Masafumi Yamazaki
C. Young
Matei A. Zaharia
107
316
0
02 Oct 2019
A Baseline for Few-Shot Image Classification
A Baseline for Few-Shot Image Classification
Guneet Singh Dhillon
Pratik Chaudhari
Avinash Ravichandran
Stefano Soatto
125
583
0
06 Sep 2019
Lookahead Optimizer: k steps forward, 1 step back
Lookahead Optimizer: k steps forward, 1 step back
Michael Ruogu Zhang
James Lucas
Geoffrey E. Hinton
Jimmy Ba
ODL
164
734
0
19 Jul 2019
Fast Training of Sparse Graph Neural Networks on Dense Hardware
Fast Training of Sparse Graph Neural Networks on Dense Hardware
Matej Balog
B. V. Merrienboer
Subhodeep Moitra
Yujia Li
Daniel Tarlow
GNN
58
10
0
27 Jun 2019
Database Meets Deep Learning: Challenges and Opportunities
Database Meets Deep Learning: Challenges and Opportunities
Wei Wang
Meihui Zhang
Gang Chen
H. V. Jagadish
Beng Chin Ooi
K. Tan
82
148
0
21 Jun 2019
Deep Leakage from Gradients
Deep Leakage from Gradients
Ligeng Zhu
Zhijian Liu
Song Han
FedML
110
2,229
0
21 Jun 2019
Layered SGD: A Decentralized and Synchronous SGD Algorithm for Scalable
  Deep Neural Network Training
Layered SGD: A Decentralized and Synchronous SGD Algorithm for Scalable Deep Neural Network Training
K. Yu
Thomas Flynn
Shinjae Yoo
N. DÍmperio
OffRL
58
6
0
13 Jun 2019
Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the
  Limbo of Resources
Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources
Yanghua Peng
Hang Zhang
Yifei Ma
Tong He
Zhi-Li Zhang
Sheng Zha
Mu Li
50
23
0
26 Apr 2019
Distributed Deep Learning Strategies For Automatic Speech Recognition
Distributed Deep Learning Strategies For Automatic Speech Recognition
Wei Zhang
Xiaodong Cui
Ulrich Finkler
Brian Kingsbury
G. Saon
David S. Kung
M. Picheny
67
29
0
10 Apr 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
281
999
0
01 Apr 2019
Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7
  seconds
Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7 seconds
Masafumi Yamazaki
Akihiko Kasagi
Akihiro Tabuchi
Takumi Honda
Masahiro Miwa
Naoto Fukumoto
Tsuguchika Tabaru
Atsushi Ike
Kohta Nakashima
49
88
0
29 Mar 2019
Inefficiency of K-FAC for Large Batch Size Training
Inefficiency of K-FAC for Large Batch Size Training
Linjian Ma
Gabe Montague
Jiayu Ye
Z. Yao
A. Gholami
Kurt Keutzer
Michael W. Mahoney
49
24
0
14 Mar 2019
Modularity as a Means for Complexity Management in Neural Networks
  Learning
Modularity as a Means for Complexity Management in Neural Networks Learning
David Castillo-Bolado
Cayetano Guerra
M. Hernández-Tejera
32
5
0
25 Feb 2019
Optimizing Network Performance for Distributed DNN Training on GPU
  Clusters: ImageNet/AlexNet Training in 1.5 Minutes
Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes
Peng Sun
Wansen Feng
Ruobing Han
Shengen Yan
Yonggang Wen
AI4CE
88
70
0
19 Feb 2019
TF-Replicator: Distributed Machine Learning for Researchers
TF-Replicator: Distributed Machine Learning for Researchers
P. Buchlovsky
David Budden
Dominik Grewe
Chris Jones
John Aslanides
...
Aidan Clark
Sergio Gomez Colmenarejo
Aedan Pope
Fabio Viola
Dan Belov
GNNOffRLAI4CE
76
20
0
01 Feb 2019
A Modular Benchmarking Infrastructure for High-Performance and
  Reproducible Deep Learning
A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning
Tal Ben-Nun
Maciej Besta
Simon Huber
A. Ziogas
D. Peter
Torsten Hoefler
ELMALM
69
78
0
29 Jan 2019
Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample
Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample
A. Berahas
Majid Jahani
Peter Richtárik
Martin Takávc
102
41
0
28 Jan 2019
Augment your batch: better training with larger batches
Augment your batch: better training with larger batches
Elad Hoffer
Tal Ben-Nun
Itay Hubara
Niv Giladi
Torsten Hoefler
Daniel Soudry
ODL
118
76
0
27 Jan 2019
Large-Batch Training for LSTM and Beyond
Large-Batch Training for LSTM and Beyond
Yang You
Jonathan Hseu
Chris Ying
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
56
91
0
24 Jan 2019
A Distributed Synchronous SGD Algorithm with Global Top-$k$
  Sparsification for Low Bandwidth Networks
A Distributed Synchronous SGD Algorithm with Global Top-kkk Sparsification for Low Bandwidth Networks
Shaoshuai Shi
Qiang-qiang Wang
Kaiyong Zhao
Zhenheng Tang
Yuxin Wang
Xiang Huang
Xiaowen Chu
79
136
0
14 Jan 2019
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU
  Servers
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers
A. Koliousis
Pijika Watcharapichat
Matthias Weidlich
Kai Zou
Paolo Costa
Peter R. Pietzuch
57
70
0
08 Jan 2019
An Empirical Model of Large-Batch Training
An Empirical Model of Large-Batch Training
Sam McCandlish
Jared Kaplan
Dario Amodei
OpenAI Dota Team
76
280
0
14 Dec 2018
Nonlinear Conjugate Gradients For Scaling Synchronous Distributed DNN
  Training
Nonlinear Conjugate Gradients For Scaling Synchronous Distributed DNN Training
Saurabh N. Adya
Vinay Palakkode
Oncel Tuzel
23
4
0
07 Dec 2018
Bag of Tricks for Image Classification with Convolutional Neural
  Networks
Bag of Tricks for Image Classification with Convolutional Neural Networks
Tong He
Zhi-Li Zhang
Hang Zhang
Zhongyue Zhang
Junyuan Xie
Mu Li
293
1,421
0
04 Dec 2018
On the Computational Inefficiency of Large Batch Sizes for Stochastic
  Gradient Descent
On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent
Noah Golmant
N. Vemuri
Z. Yao
Vladimir Feinberg
A. Gholami
Kai Rothauge
Michael W. Mahoney
Joseph E. Gonzalez
92
73
0
30 Nov 2018
Large-Scale Distributed Second-Order Optimization Using
  Kronecker-Factored Approximate Curvature for Deep Convolutional Neural
  Networks
Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks
Kazuki Osawa
Yohei Tsuji
Yuichiro Ueno
Akira Naruse
Rio Yokota
Satoshi Matsuoka
ODL
107
95
0
29 Nov 2018
MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD
  Algorithms
MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD Algorithms
Shaoshuai Shi
Xiaowen Chu
Bo Li
FedML
91
90
0
27 Nov 2018
Stochastic Gradient Push for Distributed Deep Learning
Stochastic Gradient Push for Distributed Deep Learning
Mahmoud Assran
Nicolas Loizou
Nicolas Ballas
Michael G. Rabbat
110
347
0
27 Nov 2018
A Simple Non-i.i.d. Sampling Approach for Efficient Training and Better
  Generalization
A Simple Non-i.i.d. Sampling Approach for Efficient Training and Better Generalization
Bowen Cheng
Yunchao Wei
Jiahui Yu
Shiyu Chang
Jinjun Xiong
Wen-mei W. Hwu
Thomas S. Huang
Humphrey Shi
OODVLM
103
6
0
23 Nov 2018
Previous
123
Next