ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.04325
  4. Cited By
Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15
  Minutes

Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes

12 November 2017
Takuya Akiba
Shuji Suzuki
Keisuke Fukuda
    VLM
ArXivPDFHTML

Papers citing "Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes"

36 / 36 papers shown
Title
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
159
0
0
30 Dec 2024
I/O in Machine Learning Applications on HPC Systems: A 360-degree Survey
I/O in Machine Learning Applications on HPC Systems: A 360-degree Survey
Noah Lewis
J. L. Bez
Suren Byna
57
0
0
16 Apr 2024
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware
  Communication Compression
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
Jaeyong Song
Jinkyu Yim
Jaewon Jung
Hongsun Jang
H. Kim
Youngsok Kim
Jinho Lee
GNN
24
25
0
24 Jan 2023
Analyzing I/O Performance of a Hierarchical HPC Storage System for
  Distributed Deep Learning
Analyzing I/O Performance of a Hierarchical HPC Storage System for Distributed Deep Learning
Takaaki Fukai
Kento Sato
Takahiro Hirofuchi
34
2
0
04 Jan 2023
Exploiting the Partly Scratch-off Lottery Ticket for Quantization-Aware
  Training
Exploiting the Partly Scratch-off Lottery Ticket for Quantization-Aware Training
Mingliang Xu
Gongrui Nan
Yuxin Zhang
Rongrong Ji
Rongrong Ji
MQ
18
3
0
12 Nov 2022
Numerical Optimizations for Weighted Low-rank Estimation on Language
  Model
Numerical Optimizations for Weighted Low-rank Estimation on Language Model
Ting Hua
Yen-Chang Hsu
Felicity Wang
Qiang Lou
Yilin Shen
Hongxia Jin
27
13
0
02 Nov 2022
Transfer learning and Local interpretable model agnostic based visual
  approach in Monkeypox Disease Detection and Classification: A Deep Learning
  insights
Transfer learning and Local interpretable model agnostic based visual approach in Monkeypox Disease Detection and Classification: A Deep Learning insights
M. Ahsan
Tareque Abu Abdullah
Md. Shahin Ali
Fatema Jahora
Md. Khairul Islam
Amin G. Alhashim
Kishor Datta Gupta
24
9
0
01 Nov 2022
A New Perspective for Understanding Generalization Gap of Deep Neural
  Networks Trained with Large Batch Sizes
A New Perspective for Understanding Generalization Gap of Deep Neural Networks Trained with Large Batch Sizes
O. Oyedotun
Konstantinos Papadopoulos
Djamila Aouada
AI4CE
32
11
0
21 Oct 2022
ResIST: Layer-Wise Decomposition of ResNets for Distributed Training
ResIST: Layer-Wise Decomposition of ResNets for Distributed Training
Chen Dun
Cameron R. Wolfe
C. Jermaine
Anastasios Kyrillidis
16
21
0
02 Jul 2021
Concurrent Adversarial Learning for Large-Batch Training
Concurrent Adversarial Learning for Large-Batch Training
Yong Liu
Xiangning Chen
Minhao Cheng
Cho-Jui Hsieh
Yang You
ODL
28
13
0
01 Jun 2021
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training
Y. Fu
Haoran You
Yang Katie Zhao
Yue Wang
Chaojian Li
K. Gopalakrishnan
Zhangyang Wang
Yingyan Lin
MQ
38
32
0
24 Dec 2020
Understanding Training Efficiency of Deep Learning Recommendation Models
  at Scale
Understanding Training Efficiency of Deep Learning Recommendation Models at Scale
Bilge Acun
Matthew Murphy
Xiaodong Wang
Jade Nie
Carole-Jean Wu
K. Hazelwood
33
109
0
11 Nov 2020
Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network
  Training
Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training
Dingqing Yang
Amin Ghasemazar
X. Ren
Maximilian Golub
G. Lemieux
Mieszko Lis
8
48
0
23 Sep 2020
Enabling On-Device CNN Training by Self-Supervised Instance Filtering
  and Error Map Pruning
Enabling On-Device CNN Training by Self-Supervised Instance Filtering and Error Map Pruning
Yawen Wu
Zhepeng Wang
Yiyu Shi
Jiaxi Hu
16
44
0
07 Jul 2020
Data Movement Is All You Need: A Case Study on Optimizing Transformers
Data Movement Is All You Need: A Case Study on Optimizing Transformers
A. Ivanov
Nikoli Dryden
Tal Ben-Nun
Shigang Li
Torsten Hoefler
36
131
0
30 Jun 2020
Optimizing Deep Learning Recommender Systems' Training On CPU Cluster
  Architectures
Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Dhiraj D. Kalamkar
E. Georganas
Sudarshan Srinivasan
Jianping Chen
Mikhail Shiryaev
A. Heinecke
50
47
0
10 May 2020
Communication optimization strategies for distributed deep neural
  network training: A survey
Communication optimization strategies for distributed deep neural network training: A survey
Shuo Ouyang
Dezun Dong
Yemao Xu
Liquan Xiao
30
12
0
06 Mar 2020
Optimization for deep learning: theory and algorithms
Optimization for deep learning: theory and algorithms
Ruoyu Sun
ODL
19
168
0
19 Dec 2019
Fast Training of Sparse Graph Neural Networks on Dense Hardware
Fast Training of Sparse Graph Neural Networks on Dense Hardware
Matej Balog
B. V. Merrienboer
Subhodeep Moitra
Yujia Li
Daniel Tarlow
GNN
33
10
0
27 Jun 2019
Layered SGD: A Decentralized and Synchronous SGD Algorithm for Scalable
  Deep Neural Network Training
Layered SGD: A Decentralized and Synchronous SGD Algorithm for Scalable Deep Neural Network Training
K. Yu
Thomas Flynn
Shinjae Yoo
N. DÍmperio
OffRL
14
6
0
13 Jun 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
28
980
0
01 Apr 2019
Speeding up Deep Learning with Transient Servers
Speeding up Deep Learning with Transient Servers
Shijian Li
R. Walls
Lijie Xu
Tian Guo
27
12
0
28 Feb 2019
TF-Replicator: Distributed Machine Learning for Researchers
TF-Replicator: Distributed Machine Learning for Researchers
P. Buchlovsky
David Budden
Dominik Grewe
Chris Jones
John Aslanides
...
Aidan Clark
Sergio Gomez Colmenarejo
Aedan Pope
Fabio Viola
Dan Belov
GNN
OffRL
AI4CE
29
20
0
01 Feb 2019
Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample
Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample
A. Berahas
Majid Jahani
Peter Richtárik
Martin Takávc
21
40
0
28 Jan 2019
AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep
  Neural Networks
AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep Neural Networks
Jinrong Guo
Wantao Liu
Wang Wang
Q. Lu
Songlin Hu
Jizhong Han
Ruixuan Li
11
9
0
21 Jan 2019
Exascale Deep Learning for Climate Analytics
Exascale Deep Learning for Climate Analytics
Thorsten Kurth
Sean Treichler
Josh Romero
M. Mudigonda
Nathan Luehr
...
Michael A. Matheson
J. Deslippe
M. Fatica
P. Prabhat
Michael Houston
BDL
17
260
0
03 Oct 2018
Recent Advances in Object Detection in the Age of Deep Convolutional
  Neural Networks
Recent Advances in Object Detection in the Age of Deep Convolutional Neural Networks
Shivang Agarwal
Jean Ogier du Terrail
F. Jurie
ObjD
24
123
0
10 Sep 2018
PFDet: 2nd Place Solution to Open Images Challenge 2018 Object Detection
  Track
PFDet: 2nd Place Solution to Open Images Challenge 2018 Object Detection Track
Takuya Akiba
Tommi Kerola
Yusuke Niitani
Toru Ogawa
Shotaro Sano
Shuji Suzuki
14
20
0
04 Sep 2018
Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks
Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks
Soojeong Kim
Gyeong-In Yu
Hojin Park
Sungwoo Cho
Eunji Jeong
Hyeonmin Ha
Sanha Lee
Joo Seong Jeong
Byung-Gon Chun
20
73
0
08 Aug 2018
Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance
  Benchmark
Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark
Cody Coleman
Daniel Kang
Deepak Narayanan
Luigi Nardi
Tian Zhao
Jian Zhang
Peter Bailis
K. Olukotun
Christopher Ré
Matei A. Zaharia
13
117
0
04 Jun 2018
SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep
  Learning
SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning
W. Wen
Yandan Wang
Feng Yan
Cong Xu
Chunpeng Wu
Yiran Chen
H. Li
24
50
0
21 May 2018
SparCML: High-Performance Sparse Communication for Machine Learning
SparCML: High-Performance Sparse Communication for Machine Learning
Cédric Renggli
Saleh Ashkboos
Mehdi Aghagolzadeh
Dan Alistarh
Torsten Hoefler
29
126
0
22 Feb 2018
signSGD: Compressed Optimisation for Non-Convex Problems
signSGD: Compressed Optimisation for Non-Convex Problems
Jeremy Bernstein
Yu-Xiang Wang
Kamyar Azizzadenesheli
Anima Anandkumar
FedML
ODL
44
1,020
0
13 Feb 2018
Improving Generalization Performance by Switching from Adam to SGD
Improving Generalization Performance by Switching from Adam to SGD
N. Keskar
R. Socher
ODL
32
520
0
20 Dec 2017
Efficient Training of Convolutional Neural Nets on Large Distributed
  Systems
Efficient Training of Convolutional Neural Nets on Large Distributed Systems
Sameer Kumar
D. Sreedhar
Vaibhav Saxena
Yogish Sabharwal
Ashish Verma
33
4
0
02 Nov 2017
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,746
0
26 Sep 2016
1