Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1712.04432
Cited By
v1
v2
v3
v4 (latest)
Integrated Model, Batch and Domain Parallelism in Training Neural Networks
12 December 2017
A. Gholami
A. Azad
Peter H. Jin
Kurt Keutzer
A. Buluç
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Integrated Model, Batch and Domain Parallelism in Training Neural Networks"
50 / 52 papers shown
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Jared Fernandez
Luca Wehrstedt
Leonid Shamis
Mostafa Elhoushi
Kalyan Saladi
Yonatan Bisk
Emma Strubell
Jacob Kahn
1.3K
12
0
20 Nov 2024
Scalable Artificial Intelligence for Science: Perspectives, Methods and Exemplars
Wesley Brewer
Aditya Kashi
Sajal Dash
A. Tsaris
Junqi Yin
Mallikarjun Shankar
Feiyi Wang
210
1
0
24 Jun 2024
Neural Network Methods for Radiation Detectors and Imaging
Frontiers of Physics (Front. Phys.), 2023
S. Lin
S. Ning
H. Zhu
T. Zhou
C. L. Morris
S. Clayton
M. Cherukara
R. T. Chen
Z. Wang
AI4CE
286
11
0
09 Nov 2023
Distributed Matrix-Based Sampling for Graph Neural Network Training
Conference on Machine Learning and Systems (MLSys), 2023
Alok Tripathy
Katherine Yelick
A. Buluç
251
9
0
06 Nov 2023
THC: Accelerating Distributed Deep Learning Using Tensor Homomorphic Compression
Symposium on Networked Systems Design and Implementation (NSDI), 2023
Minghao Li
Ran Ben-Basat
S. Vargaftik
Chon-In Lao
Ke Xu
Michael Mitzenmacher
Minlan Yu Harvard University
430
29
0
16 Feb 2023
LOFT: Finding Lottery Tickets through Filter-wise Training
International Conference on Artificial Intelligence and Statistics (AISTATS), 2022
Qihan Wang
Chen Dun
Fangshuo Liao
C. Jermaine
Anastasios Kyrillidis
204
4
0
28 Oct 2022
OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks
Benoit Steiner
Mostafa Elhoushi
Jacob Kahn
James Hegarty
313
10
0
24 Oct 2022
Sequential Aggregation and Rematerialization: Distributed Full-batch Training of Graph Neural Networks on Large Graphs
Conference on Machine Learning and Systems (MLSys), 2021
Hesham Mostafa
GNN
332
30
0
11 Nov 2021
Model-Parallel Model Selection for Deep Learning Systems
Kabir Nagrecha
218
19
0
14 Jul 2021
ResIST: Layer-Wise Decomposition of ResNets for Distributed Training
Chen Dun
Cameron R. Wolfe
C. Jermaine
Anastasios Kyrillidis
400
23
0
02 Jul 2021
Inductive Predictions of Extreme Hydrologic Events in The Wabash River Watershed
Nicholas Majeske
B. Abesh
Chen Zhu
A. Azad
64
1
0
25 Apr 2021
An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of Convolutional Neural Networks
IEEE International Symposium on High-Performance Parallel Distributed Computing (HPDC), 2020
A. Kahira
Truong Thao Nguyen
L. Bautista-Gomez
Ryousei Takano
Rosa M. Badia
Mohamed Wahib
227
14
0
19 Apr 2021
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2021
Deepak Narayanan
Mohammad Shoeybi
Jared Casper
P. LeGresley
M. Patwary
...
Prethvi Kashinkunti
J. Bernauer
Bryan Catanzaro
Amar Phanishayee
Matei A. Zaharia
MoE
918
1,114
0
09 Apr 2021
GIST: Distributed Training for Large-Scale Graph Convolutional Networks
Journal of Applied and Computational Topology (JACT), 2021
Cameron R. Wolfe
Jingkang Yang
Arindam Chowdhury
Chen Dun
Artun Bayer
Santiago Segarra
Anastasios Kyrillidis
BDL
GNN
LRM
418
11
0
20 Feb 2021
Local Critic Training for Model-Parallel Learning of Deep Neural Networks
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2021
Hojung Lee
Cho-Jui Hsieh
Jong-Seok Lee
237
18
0
03 Feb 2021
Ship Detection: Parameter Server Variant
Benjamin Smith
135
0
0
02 Dec 2020
Integrating Deep Learning in Domain Sciences at Exascale
IEEE International Conference on Systems, Man and Cybernetics (SMC), 2020
Rick Archibald
E. Chow
E. DÁzevedo
Jack J. Dongarra
M. Eisenbach
...
Florent Lopez
Daniel Nichols
S. Tomov
Kwai Wong
Junqi Yin
PINN
179
5
0
23 Nov 2020
Scaling Distributed Deep Learning Workloads beyond the Memory Capacity with KARMA
International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2020
Mohamed Wahib
Haoyu Zhang
Truong Thao Nguyen
Aleksandr Drozd
Jens Domke
Lingqi Zhang
Ryousei Takano
Satoshi Matsuoka
OODD
253
24
0
26 Aug 2020
A Computational-Graph Partitioning Method for Training Memory-Constrained DNNs
Fareed Qararyah
Mohamed Wahib
Douga Dikbayir
M. E. Belviranli
Didem Unat
281
10
0
19 Aug 2020
The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism
IEEE Transactions on Parallel and Distributed Systems (TPDS), 2020
Yosuke Oyama
N. Maruyama
Nikoli Dryden
Erin McCarthy
P. Harrington
J. Balewski
Satoshi Matsuoka
Peter Nugent
B. Van Essen
3DV
AI4CE
223
42
0
25 Jul 2020
ICA-UNet: ICA Inspired Statistical UNet for Real-time 3D Cardiac Cine MRI Segmentation
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2020
Tianchen Wang
Xiaowei Xu
Jinjun Xiong
Qianjun Jia
Haiyun Yuan
Meiping Huang
Jian Zhuang
Yiyu Shi
231
24
0
18 Jul 2020
Data Movement Is All You Need: A Case Study on Optimizing Transformers
A. Ivanov
Nikoli Dryden
Tal Ben-Nun
Shigang Li
Torsten Hoefler
475
179
0
30 Jun 2020
Reducing Communication in Graph Neural Network Training
Alok Tripathy
Katherine Yelick
A. Buluç
GNN
389
122
0
07 May 2020
Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging
IEEE Transactions on Parallel and Distributed Systems (TPDS), 2020
Shigang Li
Tal Ben-Nun
Giorgi Nadiradze
Salvatore Di Girolamo
Nikoli Dryden
Dan Alistarh
Torsten Hoefler
490
15
0
30 Apr 2020
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training using TensorFlow
A. A. Awan
Arpan Jain
Quentin G. Anthony
Hari Subramoni
Dhabaleswar K. Panda
MoE
AI4CE
411
5
0
12 Nov 2019
Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization
Conference on Machine Learning and Systems (MLSys), 2019
Paras Jain
Ajay Jain
Aniruddha Nrusimha
A. Gholami
Pieter Abbeel
Kurt Keutzer
Ion Stoica
Joseph E. Gonzalez
325
237
0
07 Oct 2019
Distributed Equivalent Substitution Training for Large-Scale Recommender Systems
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2019
Haidong Rong
Yangzihao Wang
Feihu Zhou
Junjie Zhai
Haiyang Wu
...
Fan Li
Han Zhang
Yuekui Yang
Zhenyu Guo
Di Wang
OffRL
306
13
0
10 Sep 2019
Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training
IEEE Micro (IEEE Micro), 2019
Saptadeep Pal
Eiman Ebrahimi
A. Zulfiqar
Yaosheng Fu
Victor Zhang
Szymon Migacz
D. Nellans
Puneet Gupta
345
69
0
30 Jul 2019
Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models: Extension
Neural Information Processing Systems (NeurIPS), 2019
Yunfei Teng
Wenbo Gao
F. Chalus
A. Choromańska
Shiqian Ma
Adrian Weller
582
15
0
24 May 2019
Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism
Nikoli Dryden
N. Maruyama
Tom Benson
Tim Moon
M. Snir
B. Van Essen
281
52
0
15 Mar 2019
Inefficiency of K-FAC for Large Batch Size Training
Linjian Ma
Gabe Montague
Jiayu Ye
Z. Yao
A. Gholami
Kurt Keutzer
Michael W. Mahoney
334
24
0
14 Mar 2019
Parameter Re-Initialization through Cyclical Batch Size Schedules
Norman Mu
Z. Yao
A. Gholami
Kurt Keutzer
Michael W. Mahoney
ODL
244
8
0
04 Dec 2018
Mesh-TensorFlow: Deep Learning for Supercomputers
Noam M. Shazeer
Youlong Cheng
Niki Parmar
Dustin Tran
Ashish Vaswani
...
HyoukJoong Lee
O. Milenkovic
C. Young
Ryan Sepassi
Blake Hechtman
GNN
MoE
AI4CE
307
431
0
05 Nov 2018
Large batch size training of neural networks with adversarial training and second-order information
Z. Yao
A. Gholami
Daiyaan Arfeen
Richard Liaw
Alfons Kemper
Kurt Keutzer
Michael W. Mahoney
ODL
348
46
0
02 Oct 2018
SqueezeNext: Hardware-Aware Neural Network Design
A. Gholami
K. Kwon
Bichen Wu
Zizheng Tai
Xiangyu Yue
Peter H. Jin
Sicheng Zhao
Kurt Keutzer
235
324
0
23 Mar 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
ACM Computing Surveys (CSUR), 2018
Tal Ben-Nun
Torsten Hoefler
GNN
486
784
0
26 Feb 2018
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
Z. Yao
A. Gholami
Qi Lei
Kurt Keutzer
Michael W. Mahoney
558
186
0
22 Feb 2018
SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud
Bichen Wu
Alvin Wan
Xiangyu Yue
Kurt Keutzer
3DPC
327
905
0
19 Oct 2017
Large Batch Training of Convolutional Networks
Yang You
Igor Gitman
Boris Ginsburg
ODL
898
931
0
13 Aug 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
819
4,049
0
08 Jun 2017
SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving
Bichen Wu
Alvin Wan
F. Iandola
Peter H. Jin
Kurt Keutzer
526
533
0
04 Dec 2016
How to scale distributed deep learning?
Peter H. Jin
Qiaochu Yuan
F. Iandola
Kurt Keutzer
3DH
198
138
0
14 Nov 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
1.4K
3,373
0
15 Sep 2016
Fully Convolutional Networks for Semantic Segmentation
Evan Shelhamer
Jonathan Long
Trevor Darrell
VOS
SSeg
1.3K
41,589
0
20 May 2016
Distributed Deep Learning Using Synchronous Stochastic Gradient Descent
Dipankar Das
Sasikanth Avancha
Dheevatsa Mudigere
K. Vaidyanathan
Srinivas Sridharan
Dhiraj D. Kalamkar
Bharat Kaul
Pradeep Dubey
GNN
250
183
0
22 Feb 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
4.2K
226,071
0
10 Dec 2015
Accurate Image Super-Resolution Using Very Deep Convolutional Networks
Jiwon Kim
Jung Kwon Lee
Kyoung Mu Lee
SupR
949
6,921
0
14 Nov 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2015
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMat
ObjD
3.6K
71,917
0
04 Jun 2015
Brain Tumor Segmentation with Deep Neural Networks
Mohammad Havaei
Axel Davy
David Warde-Farley
A. Biard
Aaron Courville
Yoshua Bengio
C. Pal
Pierre-Marc Jodoin
Hugo Larochelle
3DV
519
3,057
0
13 May 2015
Deep learning with Elastic Averaging SGD
Neural Information Processing Systems (NeurIPS), 2014
Sixin Zhang
A. Choromańska
Yann LeCun
FedML
978
639
0
20 Dec 2014
1
2
Next
Page 1 of 2