ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1807.11205
  4. Cited By
Highly Scalable Deep Learning Training System with Mixed-Precision:
  Training ImageNet in Four Minutes

Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes

30 July 2018
Xianyan Jia
Shutao Song
W. He
Yangzihao Wang
Haidong Rong
Feihu Zhou
Liqiang Xie
Zhenyu Guo
Yuanzhou Yang
Li Yu
Tiegang Chen
Guangxiao Hu
Shaoshuai Shi
Xiaowen Chu
ArXiv (abs)PDFHTML

Papers citing "Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes"

50 / 109 papers shown
Title
ATTENTION2D: Communication Efficient Distributed Self-Attention Mechanism
ATTENTION2D: Communication Efficient Distributed Self-Attention Mechanism
Venmugil Elango
106
0
0
20 Mar 2025
Importance Sampling via Score-based Generative Models
Importance Sampling via Score-based Generative Models
Heasung Kim
Taekyun Lee
Hyeji Kim
Gustavo de Veciana
MedImDiffM
202
2
0
07 Feb 2025
How to set AdamW's weight decay as you scale model and dataset size
How to set AdamW's weight decay as you scale model and dataset size
Xi Wang
Laurence Aitchison
147
11
0
22 May 2024
PID-Comm: A Fast and Flexible Collective Communication Framework for
  Commodity Processing-in-DIMM Devices
PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices
Si Ung Noh
Junguk Hong
Chaemin Lim
Seong-Yeol Park
Jeehyun Kim
Hanjun Kim
Youngsok Kim
Jinho Lee
77
8
0
13 Apr 2024
Guaranteed Approximation Bounds for Mixed-Precision Neural Operators
Guaranteed Approximation Bounds for Mixed-Precision Neural Operators
Renbo Tu
Colin White
Jean Kossaifi
Boris Bonev
Nikola B. Kovachki
Gennady Pekhimenko
Kamyar Azizzadenesheli
Anima Anandkumar
64
11
0
27 Jul 2023
FFCV: Accelerating Training by Removing Data Bottlenecks
FFCV: Accelerating Training by Removing Data Bottlenecks
Guillaume Leclerc
Andrew Ilyas
Logan Engstrom
Sung Min Park
Hadi Salman
Aleksander Madry
61
70
0
21 Jun 2023
DeAR: Accelerating Distributed Deep Learning with Fine-Grained
  All-Reduce Pipelining
DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining
Lin Zhang
Shaoshuai Shi
Xiaowen Chu
Wei Wang
Yue Liu
Chengjian Liu
61
11
0
24 Feb 2023
RedBit: An End-to-End Flexible Framework for Evaluating the Accuracy of
  Quantized CNNs
RedBit: An End-to-End Flexible Framework for Evaluating the Accuracy of Quantized CNNs
A. M. Ribeiro-dos-Santos
João Dinis Ferreira
O. Mutlu
G. Falcão
MQ
84
2
0
15 Jan 2023
Disjoint Masking with Joint Distillation for Efficient Masked Image
  Modeling
Disjoint Masking with Joint Distillation for Efficient Masked Image Modeling
Xin Ma
Chang-Shu Liu
Chunyu Xie
Long Ye
Yafeng Deng
Xiang Ji
129
9
0
31 Dec 2022
Exploiting the Partly Scratch-off Lottery Ticket for Quantization-Aware
  Training
Exploiting the Partly Scratch-off Lottery Ticket for Quantization-Aware Training
Mingliang Xu
Gongrui Nan
Yuxin Zhang
Chia-Wen Lin
Rongrong Ji
MQ
48
3
0
12 Nov 2022
Large-batch Optimization for Dense Visual Predictions
Large-batch Optimization for Dense Visual Predictions
Zeyue Xue
Jianming Liang
Guanglu Song
Zhuofan Zong
Liang Chen
Yu Liu
Ping Luo
VLM
96
9
0
20 Oct 2022
Towards Efficient Communications in Federated Learning: A Contemporary
  Survey
Towards Efficient Communications in Federated Learning: A Contemporary Survey
Zihao Zhao
Yuzhu Mao
Yang Liu
Linqi Song
Ouyang Ye
Xinlei Chen
Wenbo Ding
FedML
95
63
0
02 Aug 2022
Scalable K-FAC Training for Deep Neural Networks with Distributed
  Preconditioning
Scalable K-FAC Training for Deep Neural Networks with Distributed Preconditioning
Lin Zhang
Shaoshuai Shi
Wei Wang
Yue Liu
65
10
0
30 Jun 2022
One Hyper-Initializer for All Network Architectures in Medical Image
  Analysis
One Hyper-Initializer for All Network Architectures in Medical Image Analysis
Fangxin Shang
Yehui Yang
Dalu Yang
Junde Wu
Xiaorong Wang
Yanwu Xu
AI4CE
68
2
0
08 Jun 2022
Guidelines for the Regularization of Gammas in Batch Normalization for
  Deep Residual Networks
Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks
Bum Jun Kim
Hyeyeon Choi
Hyeonah Jang
Dong Gu Lee
Wonseok Jeong
Sang Woo Kim
45
4
0
15 May 2022
Sign Bit is Enough: A Learning Synchronization Framework for Multi-hop
  All-reduce with Ultimate Compression
Sign Bit is Enough: A Learning Synchronization Framework for Multi-hop All-reduce with Ultimate Compression
Feijie Wu
Shiqi He
Song Guo
Zhihao Qu
Yining Qi
W. Zhuang
Jie Zhang
59
9
0
14 Apr 2022
Auto-scaling Vision Transformers without Training
Auto-scaling Vision Transformers without Training
Wuyang Chen
Wei-Ping Huang
Xianzhi Du
Xiaodan Song
Zhangyang Wang
Denny Zhou
ViT
66
25
0
24 Feb 2022
TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for
  Distributed Training Jobs
TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs
Weiyang Wang
Moein Khazraee
Zhizhen Zhong
M. Ghobadi
Zhihao Jia
Dheevatsa Mudigere
Ying Zhang
A. Kewitsch
118
92
0
01 Feb 2022
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Xiaoxin He
Fuzhao Xue
Xiaozhe Ren
Yang You
83
15
0
01 Nov 2021
BitTrain: Sparse Bitmap Compression for Memory-Efficient Training on the
  Edge
BitTrain: Sparse Bitmap Compression for Memory-Efficient Training on the Edge
Abdelrahman I. Hosny
Marina Neseem
Sherief Reda
MQ
89
4
0
29 Oct 2021
Synthesizing Optimal Parallelism Placement and Reduction Strategies on
  Hierarchical Systems for Deep Learning
Synthesizing Optimal Parallelism Placement and Reduction Strategies on Hierarchical Systems for Deep Learning
Ningning Xie
Tamara Norman
Dominik Grewe
Dimitrios Vytiniotis
71
17
0
20 Oct 2021
EmbRace: Accelerating Sparse Communication for Distributed Training of
  NLP Neural Networks
EmbRace: Accelerating Sparse Communication for Distributed Training of NLP Neural Networks
Shengwei Li
Zhiquan Lai
Dongsheng Li
Yiming Zhang
Xiangyu Ye
Yabo Duan
FedML
54
3
0
18 Oct 2021
Themis: A Network Bandwidth-Aware Collective Scheduling Policy for
  Distributed Training of DL Models
Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models
Saeed Rashidi
William Won
Sudarshan Srinivasan
Srinivas Sridharan
T. Krishna
GNN
83
34
0
09 Oct 2021
Stochastic Training is Not Necessary for Generalization
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
170
76
0
29 Sep 2021
Complexity-aware Adaptive Training and Inference for Edge-Cloud
  Distributed AI Systems
Complexity-aware Adaptive Training and Inference for Edge-Cloud Distributed AI Systems
Yinghan Long
I. Chakraborty
G. Srinivasan
Kaushik Roy
48
15
0
14 Sep 2021
Accelerating Distributed K-FAC with Smart Parallelism of Computing and
  Communication Tasks
Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks
Shaoshuai Shi
Lin Zhang
Yue Liu
123
9
0
14 Jul 2021
ResIST: Layer-Wise Decomposition of ResNets for Distributed Training
ResIST: Layer-Wise Decomposition of ResNets for Distributed Training
Chen Dun
Cameron R. Wolfe
C. Jermaine
Anastasios Kyrillidis
87
21
0
02 Jul 2021
Dive into Deep Learning
Dive into Deep Learning
Aston Zhang
Zachary Chase Lipton
Mu Li
Alexander J. Smola
VLM
87
570
0
21 Jun 2021
Concurrent Adversarial Learning for Large-Batch Training
Concurrent Adversarial Learning for Large-Batch Training
Yong Liu
Xiangning Chen
Minhao Cheng
Cho-Jui Hsieh
Yang You
ODL
77
13
0
01 Jun 2021
Tesseract: Parallelize the Tensor Parallelism Efficiently
Tesseract: Parallelize the Tensor Parallelism Efficiently
Boxiang Wang
Qifan Xu
Zhengda Bian
Yang You
VLMGNN
23
34
0
30 May 2021
Itsy Bitsy SpiderNet: Fully Connected Residual Network for Fraud
  Detection
Itsy Bitsy SpiderNet: Fully Connected Residual Network for Fraud Detection
S. Afanasiev
A. Smirnova
D. Kotereva
53
2
0
17 May 2021
ScaleCom: Scalable Sparsified Gradient Compression for
  Communication-Efficient Distributed Training
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training
Chia-Yu Chen
Jiamin Ni
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
...
Naigang Wang
Swagath Venkataramani
Vijayalakshmi Srinivasan
Wei Zhang
K. Gopalakrishnan
79
67
0
21 Apr 2021
On-device Federated Learning with Flower
On-device Federated Learning with Flower
Akhil Mathur
Daniel J. Beutel
Pedro Porto Buarque de Gusmão
Javier Fernandez-Marques
Taner Topal
Xinchi Qiu
Titouan Parcollet
Yan Gao
Nicholas D. Lane
FedML
94
38
0
07 Apr 2021
Large Batch Simulation for Deep Reinforcement Learning
Large Batch Simulation for Deep Reinforcement Learning
Brennan Shacklett
Erik Wijmans
Aleksei Petrenko
Manolis Savva
Dhruv Batra
V. Koltun
Kayvon Fatahalian
3DVOffRLAI4CE
88
26
0
12 Mar 2021
GRAD-MATCH: Gradient Matching based Data Subset Selection for Efficient
  Deep Model Training
GRAD-MATCH: Gradient Matching based Data Subset Selection for Efficient Deep Model Training
Krishnateja Killamsetty
D. Sivasubramanian
Ganesh Ramakrishnan
A. De
Rishabh K. Iyer
OOD
157
207
0
27 Feb 2021
GradPIM: A Practical Processing-in-DRAM Architecture for Gradient
  Descent
GradPIM: A Practical Processing-in-DRAM Architecture for Gradient Descent
Heesu Kim
Hanmin Park
Taehyun Kim
Kwanheum Cho
Eojin Lee
Soojung Ryu
Hyuk-Jae Lee
Kiyoung Choi
Jinho Lee
66
36
0
15 Feb 2021
Large-Scale Training System for 100-Million Classification at Alibaba
Large-Scale Training System for 100-Million Classification at Alibaba
Liuyihan Song
Pan Pan
Kang Zhao
Hao Yang
Yiming Chen
Yingya Zhang
Yinghui Xu
Rong Jin
84
24
0
09 Feb 2021
Rethinking Floating Point Overheads for Mixed Precision DNN Accelerators
Rethinking Floating Point Overheads for Mixed Precision DNN Accelerators
Hamzah Abdel-Aziz
Ali Shafiee
J. Shin
A. Pedram
Joseph Hassoun
MQ
67
11
0
27 Jan 2021
Crossover-SGD: A gossip-based communication in distributed deep learning
  for alleviating large mini-batch problem and enhancing scalability
Crossover-SGD: A gossip-based communication in distributed deep learning for alleviating large mini-batch problem and enhancing scalability
Sangho Yeo
Minho Bae
Minjoong Jeong
Oh-Kyoung Kwon
Sangyoon Oh
50
3
0
30 Dec 2020
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training
Y. Fu
Haoran You
Yang Zhao
Yue Wang
Chaojian Li
K. Gopalakrishnan
Zhangyang Wang
Yingyan Lin
MQ
81
32
0
24 Dec 2020
Data optimization for large batch distributed training of deep neural
  networks
Data optimization for large batch distributed training of deep neural networks
Shubhankar Gahlot
Junqi Yin
Mallikarjun Shankar
23
1
0
16 Dec 2020
Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1
  Accuracy in One Hour
Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour
Arissa Wongpanich
Hieu H. Pham
J. Demmel
Mingxing Tan
Quoc V. Le
Yang You
Sameer Kumar
65
8
0
30 Oct 2020
A Closer Look at Codistillation for Distributed Training
A Closer Look at Codistillation for Distributed Training
Shagun Sodhani
Olivier Delalleau
Mahmoud Assran
Koustuv Sinha
Nicolas Ballas
Michael G. Rabbat
123
8
0
06 Oct 2020
VirtualFlow: Decoupling Deep Learning Models from the Underlying
  Hardware
VirtualFlow: Decoupling Deep Learning Models from the Underlying Hardware
Andrew Or
Haoyu Zhang
M. Freedman
73
10
0
20 Sep 2020
Communication-efficient Decentralized Machine Learning over
  Heterogeneous Networks
Communication-efficient Decentralized Machine Learning over Heterogeneous Networks
Pan Zhou
Qian Lin
Dumitrel Loghin
Beng Chin Ooi
Yuncheng Wu
Hongfang Yu
63
37
0
12 Sep 2020
Flower: A Friendly Federated Learning Research Framework
Flower: A Friendly Federated Learning Research Framework
Daniel J. Beutel
Taner Topal
Akhil Mathur
Xinchi Qiu
Javier Fernandez-Marques
...
Lorenzo Sani
Kwing Hei Li
Titouan Parcollet
Pedro Porto Buarque de Gusmão
Nicholas D. Lane
FedML
142
822
0
28 Jul 2020
Enabling On-Device CNN Training by Self-Supervised Instance Filtering
  and Error Map Pruning
Enabling On-Device CNN Training by Self-Supervised Instance Filtering and Error Map Pruning
Yawen Wu
Zhepeng Wang
Yiyu Shi
Jiaxi Hu
74
46
0
07 Jul 2020
DAPPLE: A Pipelined Data Parallel Approach for Training Large Models
DAPPLE: A Pipelined Data Parallel Approach for Training Large Models
Shiqing Fan
Yi Rong
Chen Meng
Zongyan Cao
Siyu Wang
...
Jun Yang
Lixue Xia
Lansong Diao
Xiaoyong Liu
Wei Lin
96
240
0
02 Jul 2020
The Limit of the Batch Size
The Limit of the Batch Size
Yang You
Yuhui Wang
Huan Zhang
Zhao-jie Zhang
J. Demmel
Cho-Jui Hsieh
121
15
0
15 Jun 2020
O(1) Communication for Distributed SGD through Two-Level Gradient
  Averaging
O(1) Communication for Distributed SGD through Two-Level Gradient Averaging
Subhadeep Bhattacharya
Weikuan Yu
Fahim Chowdhury
FedML
19
2
0
12 Jun 2020
123
Next