ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.04432
  4. Cited By
Integrated Model, Batch and Domain Parallelism in Training Neural
  Networks
v1v2v3v4 (latest)

Integrated Model, Batch and Domain Parallelism in Training Neural Networks

12 December 2017
A. Gholami
A. Azad
Peter H. Jin
Kurt Keutzer
A. Buluç
ArXiv (abs)PDFHTML

Papers citing "Integrated Model, Batch and Domain Parallelism in Training Neural Networks"

50 / 52 papers shown
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Jared Fernandez
Luca Wehrstedt
Leonid Shamis
Mostafa Elhoushi
Kalyan Saladi
Yonatan Bisk
Emma Strubell
Jacob Kahn
1.3K
12
0
20 Nov 2024
Scalable Artificial Intelligence for Science: Perspectives, Methods and
  Exemplars
Scalable Artificial Intelligence for Science: Perspectives, Methods and Exemplars
Wesley Brewer
Aditya Kashi
Sajal Dash
A. Tsaris
Junqi Yin
Mallikarjun Shankar
Feiyi Wang
210
1
0
24 Jun 2024
Neural Network Methods for Radiation Detectors and Imaging
Neural Network Methods for Radiation Detectors and ImagingFrontiers of Physics (Front. Phys.), 2023
S. Lin
S. Ning
H. Zhu
T. Zhou
C. L. Morris
S. Clayton
M. Cherukara
R. T. Chen
Z. Wang
AI4CE
286
11
0
09 Nov 2023
Distributed Matrix-Based Sampling for Graph Neural Network Training
Distributed Matrix-Based Sampling for Graph Neural Network TrainingConference on Machine Learning and Systems (MLSys), 2023
Alok Tripathy
Katherine Yelick
A. Buluç
251
9
0
06 Nov 2023
THC: Accelerating Distributed Deep Learning Using Tensor Homomorphic
  Compression
THC: Accelerating Distributed Deep Learning Using Tensor Homomorphic CompressionSymposium on Networked Systems Design and Implementation (NSDI), 2023
Minghao Li
Ran Ben-Basat
S. Vargaftik
Chon-In Lao
Ke Xu
Michael Mitzenmacher
Minlan Yu Harvard University
430
29
0
16 Feb 2023
LOFT: Finding Lottery Tickets through Filter-wise Training
LOFT: Finding Lottery Tickets through Filter-wise TrainingInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022
Qihan Wang
Chen Dun
Fangshuo Liao
C. Jermaine
Anastasios Kyrillidis
204
4
0
28 Oct 2022
OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the
  Memory Usage of Neural Networks
OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks
Benoit Steiner
Mostafa Elhoushi
Jacob Kahn
James Hegarty
313
10
0
24 Oct 2022
Sequential Aggregation and Rematerialization: Distributed Full-batch
  Training of Graph Neural Networks on Large Graphs
Sequential Aggregation and Rematerialization: Distributed Full-batch Training of Graph Neural Networks on Large GraphsConference on Machine Learning and Systems (MLSys), 2021
Hesham Mostafa
GNN
332
30
0
11 Nov 2021
Model-Parallel Model Selection for Deep Learning Systems
Model-Parallel Model Selection for Deep Learning Systems
Kabir Nagrecha
218
19
0
14 Jul 2021
ResIST: Layer-Wise Decomposition of ResNets for Distributed Training
ResIST: Layer-Wise Decomposition of ResNets for Distributed Training
Chen Dun
Cameron R. Wolfe
C. Jermaine
Anastasios Kyrillidis
400
23
0
02 Jul 2021
Inductive Predictions of Extreme Hydrologic Events in The Wabash River
  Watershed
Inductive Predictions of Extreme Hydrologic Events in The Wabash River Watershed
Nicholas Majeske
B. Abesh
Chen Zhu
A. Azad
64
1
0
25 Apr 2021
An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of
  Convolutional Neural Networks
An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of Convolutional Neural NetworksIEEE International Symposium on High-Performance Parallel Distributed Computing (HPDC), 2020
A. Kahira
Truong Thao Nguyen
L. Bautista-Gomez
Ryousei Takano
Rosa M. Badia
Mohamed Wahib
227
14
0
19 Apr 2021
Efficient Large-Scale Language Model Training on GPU Clusters Using
  Megatron-LM
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LMInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2021
Deepak Narayanan
Mohammad Shoeybi
Jared Casper
P. LeGresley
M. Patwary
...
Prethvi Kashinkunti
J. Bernauer
Bryan Catanzaro
Amar Phanishayee
Matei A. Zaharia
MoE
918
1,114
0
09 Apr 2021
GIST: Distributed Training for Large-Scale Graph Convolutional Networks
GIST: Distributed Training for Large-Scale Graph Convolutional NetworksJournal of Applied and Computational Topology (JACT), 2021
Cameron R. Wolfe
Jingkang Yang
Arindam Chowdhury
Chen Dun
Artun Bayer
Santiago Segarra
Anastasios Kyrillidis
BDLGNNLRM
418
11
0
20 Feb 2021
Local Critic Training for Model-Parallel Learning of Deep Neural
  Networks
Local Critic Training for Model-Parallel Learning of Deep Neural NetworksIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2021
Hojung Lee
Cho-Jui Hsieh
Jong-Seok Lee
237
18
0
03 Feb 2021
Ship Detection: Parameter Server Variant
Ship Detection: Parameter Server Variant
Benjamin Smith
135
0
0
02 Dec 2020
Integrating Deep Learning in Domain Sciences at Exascale
Integrating Deep Learning in Domain Sciences at ExascaleIEEE International Conference on Systems, Man and Cybernetics (SMC), 2020
Rick Archibald
E. Chow
E. DÁzevedo
Jack J. Dongarra
M. Eisenbach
...
Florent Lopez
Daniel Nichols
S. Tomov
Kwai Wong
Junqi Yin
PINN
179
5
0
23 Nov 2020
Scaling Distributed Deep Learning Workloads beyond the Memory Capacity
  with KARMA
Scaling Distributed Deep Learning Workloads beyond the Memory Capacity with KARMAInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2020
Mohamed Wahib
Haoyu Zhang
Truong Thao Nguyen
Aleksandr Drozd
Jens Domke
Lingqi Zhang
Ryousei Takano
Satoshi Matsuoka
OODD
253
24
0
26 Aug 2020
A Computational-Graph Partitioning Method for Training
  Memory-Constrained DNNs
A Computational-Graph Partitioning Method for Training Memory-Constrained DNNs
Fareed Qararyah
Mohamed Wahib
Douga Dikbayir
M. E. Belviranli
Didem Unat
281
10
0
19 Aug 2020
The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs
  with Hybrid Parallelism
The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid ParallelismIEEE Transactions on Parallel and Distributed Systems (TPDS), 2020
Yosuke Oyama
N. Maruyama
Nikoli Dryden
Erin McCarthy
P. Harrington
J. Balewski
Satoshi Matsuoka
Peter Nugent
B. Van Essen
3DVAI4CE
223
42
0
25 Jul 2020
ICA-UNet: ICA Inspired Statistical UNet for Real-time 3D Cardiac Cine
  MRI Segmentation
ICA-UNet: ICA Inspired Statistical UNet for Real-time 3D Cardiac Cine MRI SegmentationInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2020
Tianchen Wang
Xiaowei Xu
Jinjun Xiong
Qianjun Jia
Haiyun Yuan
Meiping Huang
Jian Zhuang
Yiyu Shi
231
24
0
18 Jul 2020
Data Movement Is All You Need: A Case Study on Optimizing Transformers
Data Movement Is All You Need: A Case Study on Optimizing Transformers
A. Ivanov
Nikoli Dryden
Tal Ben-Nun
Shigang Li
Torsten Hoefler
475
179
0
30 Jun 2020
Reducing Communication in Graph Neural Network Training
Reducing Communication in Graph Neural Network Training
Alok Tripathy
Katherine Yelick
A. Buluç
GNN
389
122
0
07 May 2020
Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging
Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group AveragingIEEE Transactions on Parallel and Distributed Systems (TPDS), 2020
Shigang Li
Tal Ben-Nun
Giorgi Nadiradze
Salvatore Di Girolamo
Nikoli Dryden
Dan Alistarh
Torsten Hoefler
490
15
0
30 Apr 2020
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN
  Training using TensorFlow
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training using TensorFlow
A. A. Awan
Arpan Jain
Quentin G. Anthony
Hari Subramoni
Dhabaleswar K. Panda
MoEAI4CE
411
5
0
12 Nov 2019
Checkmate: Breaking the Memory Wall with Optimal Tensor
  Rematerialization
Checkmate: Breaking the Memory Wall with Optimal Tensor RematerializationConference on Machine Learning and Systems (MLSys), 2019
Paras Jain
Ajay Jain
Aniruddha Nrusimha
A. Gholami
Pieter Abbeel
Kurt Keutzer
Ion Stoica
Joseph E. Gonzalez
325
237
0
07 Oct 2019
Distributed Equivalent Substitution Training for Large-Scale Recommender
  Systems
Distributed Equivalent Substitution Training for Large-Scale Recommender SystemsAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2019
Haidong Rong
Yangzihao Wang
Feihu Zhou
Junjie Zhai
Haiyang Wu
...
Fan Li
Han Zhang
Yuekui Yang
Zhenyu Guo
Di Wang
OffRL
306
13
0
10 Sep 2019
Optimizing Multi-GPU Parallelization Strategies for Deep Learning
  Training
Optimizing Multi-GPU Parallelization Strategies for Deep Learning TrainingIEEE Micro (IEEE Micro), 2019
Saptadeep Pal
Eiman Ebrahimi
A. Zulfiqar
Yaosheng Fu
Victor Zhang
Szymon Migacz
D. Nellans
Puneet Gupta
345
69
0
30 Jul 2019
Leader Stochastic Gradient Descent for Distributed Training of Deep
  Learning Models: Extension
Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models: ExtensionNeural Information Processing Systems (NeurIPS), 2019
Yunfei Teng
Wenbo Gao
F. Chalus
A. Choromańska
Shiqian Ma
Adrian Weller
582
15
0
24 May 2019
Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained
  Parallelism
Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism
Nikoli Dryden
N. Maruyama
Tom Benson
Tim Moon
M. Snir
B. Van Essen
281
52
0
15 Mar 2019
Inefficiency of K-FAC for Large Batch Size Training
Inefficiency of K-FAC for Large Batch Size Training
Linjian Ma
Gabe Montague
Jiayu Ye
Z. Yao
A. Gholami
Kurt Keutzer
Michael W. Mahoney
334
24
0
14 Mar 2019
Parameter Re-Initialization through Cyclical Batch Size Schedules
Parameter Re-Initialization through Cyclical Batch Size Schedules
Norman Mu
Z. Yao
A. Gholami
Kurt Keutzer
Michael W. Mahoney
ODL
244
8
0
04 Dec 2018
Mesh-TensorFlow: Deep Learning for Supercomputers
Mesh-TensorFlow: Deep Learning for Supercomputers
Noam M. Shazeer
Youlong Cheng
Niki Parmar
Dustin Tran
Ashish Vaswani
...
HyoukJoong Lee
O. Milenkovic
C. Young
Ryan Sepassi
Blake Hechtman
GNNMoEAI4CE
307
431
0
05 Nov 2018
Large batch size training of neural networks with adversarial training
  and second-order information
Large batch size training of neural networks with adversarial training and second-order information
Z. Yao
A. Gholami
Daiyaan Arfeen
Richard Liaw
Alfons Kemper
Kurt Keutzer
Michael W. Mahoney
ODL
348
46
0
02 Oct 2018
SqueezeNext: Hardware-Aware Neural Network Design
SqueezeNext: Hardware-Aware Neural Network Design
A. Gholami
K. Kwon
Bichen Wu
Zizheng Tai
Xiangyu Yue
Peter H. Jin
Sicheng Zhao
Kurt Keutzer
235
324
0
23 Mar 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth
  Concurrency Analysis
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency AnalysisACM Computing Surveys (CSUR), 2018
Tal Ben-Nun
Torsten Hoefler
GNN
486
784
0
26 Feb 2018
Hessian-based Analysis of Large Batch Training and Robustness to
  Adversaries
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
Z. Yao
A. Gholami
Qi Lei
Kurt Keutzer
Michael W. Mahoney
558
186
0
22 Feb 2018
SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time
  Road-Object Segmentation from 3D LiDAR Point Cloud
SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud
Bichen Wu
Alvin Wan
Xiangyu Yue
Kurt Keutzer
3DPC
327
905
0
19 Oct 2017
Large Batch Training of Convolutional Networks
Large Batch Training of Convolutional Networks
Yang You
Igor Gitman
Boris Ginsburg
ODL
898
931
0
13 Aug 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
819
4,049
0
08 Jun 2017
SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural
  Networks for Real-Time Object Detection for Autonomous Driving
SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving
Bichen Wu
Alvin Wan
F. Iandola
Peter H. Jin
Kurt Keutzer
526
533
0
04 Dec 2016
How to scale distributed deep learning?
How to scale distributed deep learning?
Peter H. Jin
Qiaochu Yuan
F. Iandola
Kurt Keutzer
3DH
198
138
0
14 Nov 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
1.4K
3,373
0
15 Sep 2016
Fully Convolutional Networks for Semantic Segmentation
Fully Convolutional Networks for Semantic Segmentation
Evan Shelhamer
Jonathan Long
Trevor Darrell
VOSSSeg
1.3K
41,589
0
20 May 2016
Distributed Deep Learning Using Synchronous Stochastic Gradient Descent
Distributed Deep Learning Using Synchronous Stochastic Gradient Descent
Dipankar Das
Sasikanth Avancha
Dheevatsa Mudigere
K. Vaidyanathan
Srinivas Sridharan
Dhiraj D. Kalamkar
Bharat Kaul
Pradeep Dubey
GNN
250
183
0
22 Feb 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
4.2K
226,071
0
10 Dec 2015
Accurate Image Super-Resolution Using Very Deep Convolutional Networks
Accurate Image Super-Resolution Using Very Deep Convolutional Networks
Jiwon Kim
Jung Kwon Lee
Kyoung Mu Lee
SupR
949
6,921
0
14 Nov 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal
  Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2015
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMatObjD
3.6K
71,917
0
04 Jun 2015
Brain Tumor Segmentation with Deep Neural Networks
Brain Tumor Segmentation with Deep Neural Networks
Mohammad Havaei
Axel Davy
David Warde-Farley
A. Biard
Aaron Courville
Yoshua Bengio
C. Pal
Pierre-Marc Jodoin
Hugo Larochelle
3DV
519
3,057
0
13 May 2015
Deep learning with Elastic Averaging SGD
Deep learning with Elastic Averaging SGDNeural Information Processing Systems (NeurIPS), 2014
Sixin Zhang
A. Choromańska
Yann LeCun
FedML
978
639
0
20 Dec 2014
12
Next
Page 1 of 2