ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1802.04924
  4. Cited By
Exploring Hidden Dimensions in Parallelizing Convolutional Neural
  Networks

Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks

14 February 2018
Zhihao Jia
Sina Lin
C. Qi
A. Aiken
ArXivPDFHTML

Papers citing "Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks"

48 / 48 papers shown
Title
Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming Approach
Ruifeng She
Bowen Pang
Kai Li
Zehua Liu
Tao Zhong
61
0
0
12 Mar 2025
Flexible Coded Distributed Convolution Computing for Enhanced Fault
  Tolerance and Numerical Stability in Distributed CNNs
Flexible Coded Distributed Convolution Computing for Enhanced Fault Tolerance and Numerical Stability in Distributed CNNs
Shuo Tan
Rui Liu
XianLei Long
Kai Wan
Linqi Song
Yong Li
32
0
0
03 Nov 2024
Efficient Training of Large Language Models on Distributed
  Infrastructures: A Survey
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Peng Sun
73
8
0
29 Jul 2024
PaSE: Parallelization Strategies for Efficient DNN Training
PaSE: Parallelization Strategies for Efficient DNN Training
Venmugil Elango
16
9
0
04 Jul 2024
A Survey on Design Methodologies for Accelerating Deep Learning on
  Heterogeneous Architectures
A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures
Fabrizio Ferrandi
S. Curzel
Leandro Fiorin
Daniele Ielmini
Cristina Silvano
...
Salvatore Filippone
F. L. Presti
Francesco Silvestri
P. Palazzari
Stefania Perri
26
4
0
29 Nov 2023
AMSP: Reducing Communication Overhead of ZeRO for Efficient LLM Training
AMSP: Reducing Communication Overhead of ZeRO for Efficient LLM Training
Qiaoling Chen
Qi Hu
Guoteng Wang
Zhisheng Ye
Ting Huang
...
Yang Gao
Hang Yan
Yonggang Wen
Tianwei Zhang
Peng Sun
37
6
0
01 Nov 2023
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
Hao Lin
Ke Wu
Jie Li
Jun Yu Li
Wu-Jun Li
31
1
0
31 Jul 2023
Improving Automatic Parallel Training via Balanced Memory Workload
  Optimization
Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Yujie Wang
Youhe Jiang
Xupeng Miao
Fangcheng Fu
Shenhan Zhu
Xiaonan Nie
Yaofeng Tu
Bin Cui
42
9
0
05 Jul 2023
Pre-train and Search: Efficient Embedding Table Sharding with
  Pre-trained Neural Cost Models
Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models
Daochen Zha
Louis Feng
Liangchen Luo
Bhargav Bhushanam
Zirui Liu
...
J. McMahon
Yuzhen Huang
Bryan Clarke
A. Kejariwal
Xia Hu
50
7
0
03 May 2023
DISCO: Distributed Inference with Sparse Communications
DISCO: Distributed Inference with Sparse Communications
Minghai Qin
Chaowen Sun
Jaco A. Hofmann
D. Vučinić
FedML
27
1
0
22 Feb 2023
AutoDDL: Automatic Distributed Deep Learning with Near-Optimal Bandwidth
  Cost
AutoDDL: Automatic Distributed Deep Learning with Near-Optimal Bandwidth Cost
Jinfan Chen
Shigang Li
Ran Guo
Jinhui Yuan
Torsten Hoefler
23
2
0
17 Jan 2023
Galvatron: Efficient Transformer Training over Multiple GPUs Using
  Automatic Parallelism
Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism
Xupeng Miao
Yujie Wang
Youhe Jiang
Chunan Shi
Xiaonan Nie
Hailin Zhang
Bin Cui
GNN
MoE
37
60
0
25 Nov 2022
DreamShard: Generalizable Embedding Table Placement for Recommender
  Systems
DreamShard: Generalizable Embedding Table Placement for Recommender Systems
Daochen Zha
Louis Feng
Qiaoyu Tan
Zirui Liu
Kwei-Herng Lai
Bhargav Bhushanam
Yuandong Tian
A. Kejariwal
Xia Hu
LMTD
OffRL
20
28
0
05 Oct 2022
Layer-Wise Partitioning and Merging for Efficient and Scalable Deep
  Learning
Layer-Wise Partitioning and Merging for Efficient and Scalable Deep Learning
S. Akintoye
Liangxiu Han
H. Lloyd
Xin Zhang
Darren Dancey
Haoming Chen
Daoqiang Zhang
FedML
28
5
0
22 Jul 2022
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud
Zhen Zhang
Shuai Zheng
Yida Wang
Justin Chiu
George Karypis
Trishul M. Chilimbi
Mu Li
Xin Jin
13
39
0
30 Apr 2022
FuncPipe: A Pipelined Serverless Framework for Fast and Cost-efficient
  Training of Deep Learning Models
FuncPipe: A Pipelined Serverless Framework for Fast and Cost-efficient Training of Deep Learning Models
Yunzhuo Liu
Bo Jiang
Tian Guo
Zimeng Huang
Wen-ping Ma
Xinbing Wang
Chenghu Zhou
19
9
0
28 Apr 2022
Hercules: Heterogeneity-Aware Inference Serving for At-Scale
  Personalized Recommendation
Hercules: Heterogeneity-Aware Inference Serving for At-Scale Personalized Recommendation
Liu Ke
Udit Gupta
Mark Hempstead
Carole-Jean Wu
Hsien-Hsin S. Lee
Xuan Zhang
24
21
0
14 Mar 2022
BagPipe: Accelerating Deep Recommendation Model Training
BagPipe: Accelerating Deep Recommendation Model Training
Saurabh Agarwal
Chengpo Yan
Ziyi Zhang
Shivaram Venkataraman
29
17
0
24 Feb 2022
DistrEdge: Speeding up Convolutional Neural Network Inference on
  Distributed Edge Devices
DistrEdge: Speeding up Convolutional Neural Network Inference on Distributed Edge Devices
Xueyu Hou
Yongjie Guan
Tao Han
Ning Zhang
14
41
0
03 Feb 2022
End-to-end Adaptive Distributed Training on PaddlePaddle
End-to-end Adaptive Distributed Training on PaddlePaddle
Yulong Ao
Zhihua Wu
Dianhai Yu
Weibao Gong
Zhiqing Kui
...
Yanjun Ma
Tian Wu
Haifeng Wang
Wei Zeng
Chao Yang
19
10
0
06 Dec 2021
A Survey of Large-Scale Deep Learning Serving System Optimization:
  Challenges and Opportunities
A Survey of Large-Scale Deep Learning Serving System Optimization: Challenges and Opportunities
Fuxun Yu
Di Wang
Longfei Shangguan
Minjia Zhang
Xulong Tang
Chenchen Liu
Xiang Chen
24
9
0
28 Nov 2021
Collage: Seamless Integration of Deep Learning Backends with Automatic
  Placement
Collage: Seamless Integration of Deep Learning Backends with Automatic Placement
Byungsoo Jeon
Sunghyun Park
Peiyuan Liao
Sheng Xu
Tianqi Chen
Zhihao Jia
VLM
28
4
0
01 Nov 2021
OneFlow: Redesign the Distributed Deep Learning Framework from Scratch
OneFlow: Redesign the Distributed Deep Learning Framework from Scratch
Jinhui Yuan
Xinqi Li
Cheng Cheng
Juncheng Liu
Ran Guo
...
Fei Yang
Xiaodong Yi
Chuan Wu
Haoran Zhang
Jie Zhao
27
36
0
28 Oct 2021
Enabling Large Batch Size Training for DNN Models Beyond the Memory
  Limit While Maintaining Performance
Enabling Large Batch Size Training for DNN Models Beyond the Memory Limit While Maintaining Performance
Nathanaël Fijalkow
DoangJoo Synn
Jooyoung Park
Jong-Kook Kim
19
5
0
24 Oct 2021
Partitioning sparse deep neural networks for scalable training and
  inference
Partitioning sparse deep neural networks for scalable training and inference
G. Demirci
Hakan Ferhatosmanoglu
18
11
0
23 Apr 2021
An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of
  Convolutional Neural Networks
An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of Convolutional Neural Networks
A. Kahira
Truong Thao Nguyen
L. Bautista-Gomez
Ryousei Takano
Rosa M. Badia
M. Wahib
15
9
0
19 Apr 2021
Creating Robust Deep Neural Networks With Coded Distributed Computing
  for IoT Systems
Creating Robust Deep Neural Networks With Coded Distributed Computing for IoT Systems
Ramyad Hadidi
Jiashen Cao
Hyesoon Kim
16
0
0
09 Apr 2021
Automatic Graph Partitioning for Very Large-scale Deep Learning
Automatic Graph Partitioning for Very Large-scale Deep Learning
Masahiro Tanaka
Kenjiro Taura
T. Hanawa
Kentaro Torisawa
GNN
AI4CE
31
19
0
30 Mar 2021
On the Utility of Gradient Compression in Distributed Training Systems
On the Utility of Gradient Compression in Distributed Training Systems
Saurabh Agarwal
Hongyi Wang
Shivaram Venkataraman
Dimitris Papailiopoulos
31
46
0
28 Feb 2021
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale
  Language Models
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models
Zhuohan Li
Siyuan Zhuang
Shiyuan Guo
Danyang Zhuo
Hao Zhang
D. Song
Ion Stoica
MoE
11
121
0
16 Feb 2021
Local Critic Training for Model-Parallel Learning of Deep Neural
  Networks
Local Critic Training for Model-Parallel Learning of Deep Neural Networks
Hojung Lee
Cho-Jui Hsieh
Jong-Seok Lee
20
15
0
03 Feb 2021
BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training
BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training
Letian Zhao
Rui Xu
Tianqi Wang
Teng Tian
Xiaotian Wang
Wei Wu
Chio-in Ieong
Xi Jin
MoE
19
8
0
23 Dec 2020
Scaling Distributed Deep Learning Workloads beyond the Memory Capacity
  with KARMA
Scaling Distributed Deep Learning Workloads beyond the Memory Capacity with KARMA
M. Wahib
Haoyu Zhang
Truong Thao Nguyen
Aleksandr Drozd
Jens Domke
Lingqi Zhang
Ryousei Takano
Satoshi Matsuoka
OODD
34
23
0
26 Aug 2020
A Computational-Graph Partitioning Method for Training
  Memory-Constrained DNNs
A Computational-Graph Partitioning Method for Training Memory-Constrained DNNs
Fareed Qararyah
M. Wahib
Douga Dikbayir
M. E. Belviranli
D. Unat
19
8
0
19 Aug 2020
Efficient Algorithms for Device Placement of DNN Graph Operators
Efficient Algorithms for Device Placement of DNN Graph Operators
Jakub Tarnawski
Amar Phanishayee
Nikhil R. Devanur
Divya Mahajan
Fanny Nina Paravecino
25
66
0
29 Jun 2020
Energy-Aware DNN Graph Optimization
Energy-Aware DNN Graph Optimization
Yu Wang
Rong Ge
Shuang Qiu
GNN
17
2
0
12 May 2020
Hot-Starting the Ac Power Flow with Convolutional Neural Networks
Hot-Starting the Ac Power Flow with Convolutional Neural Networks
Liangjie Chen
J. Tate
AI4CE
13
16
0
20 Apr 2020
TensorOpt: Exploring the Tradeoffs in Distributed DNN Training with
  Auto-Parallelism
TensorOpt: Exploring the Tradeoffs in Distributed DNN Training with Auto-Parallelism
Zhenkun Cai
Kaihao Ma
Xiao Yan
Yidi Wu
Yuzhen Huang
James Cheng
Teng Su
F. Yu
11
42
0
16 Apr 2020
Ordering Chaos: Memory-Aware Scheduling of Irregularly Wired Neural
  Networks for Edge Devices
Ordering Chaos: Memory-Aware Scheduling of Irregularly Wired Neural Networks for Edge Devices
Byung Hoon Ahn
Jinwon Lee
J. Lin
Hsin-Pai Cheng
Jilei Hou
H. Esmaeilzadeh
73
54
0
04 Mar 2020
Simulating Performance of ML Systems with Offline Profiling
Simulating Performance of ML Systems with Offline Profiling
Hong-Jie Huang
Peng Cheng
Hong Xu
Y. Xiong
OffRL
11
0
0
17 Feb 2020
Distributed Equivalent Substitution Training for Large-Scale Recommender
  Systems
Distributed Equivalent Substitution Training for Large-Scale Recommender Systems
Haidong Rong
Yangzihao Wang
Feihu Zhou
Junjie Zhai
Haiyang Wu
...
Fan Li
Han Zhang
Yuekui Yang
Zhenyu Guo
Di Wang
OffRL
11
11
0
10 Sep 2019
Optimizing Multi-GPU Parallelization Strategies for Deep Learning
  Training
Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training
Saptadeep Pal
Eiman Ebrahimi
A. Zulfiqar
Yaosheng Fu
Victor Zhang
Szymon Migacz
D. Nellans
Puneet Gupta
34
55
0
30 Jul 2019
Database Meets Deep Learning: Challenges and Opportunities
Database Meets Deep Learning: Challenges and Opportunities
Wei Wang
Meihui Zhang
Gang Chen
H. V. Jagadish
Beng Chin Ooi
K. Tan
11
147
0
21 Jun 2019
Accelerated Training for CNN Distributed Deep Learning through Automatic
  Resource-Aware Layer Placement
Accelerated Training for CNN Distributed Deep Learning through Automatic Resource-Aware Layer Placement
Jay H. Park
Sunghwan Kim
Jinwon Lee
Myeongjae Jeon
S. Noh
22
11
0
17 Jan 2019
Mesh-TensorFlow: Deep Learning for Supercomputers
Mesh-TensorFlow: Deep Learning for Supercomputers
Noam M. Shazeer
Youlong Cheng
Niki Parmar
Dustin Tran
Ashish Vaswani
...
HyoukJoong Lee
O. Milenkovic
C. Young
Ryan Sepassi
Blake Hechtman
GNN
MoE
AI4CE
11
385
0
05 Nov 2018
Efficient and Robust Parallel DNN Training through Model Parallelism on
  Multi-GPU Platform
Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform
Chi-Chung Chen
Chia-Lin Yang
Hsiang-Yun Cheng
17
100
0
08 Sep 2018
Supporting Very Large Models using Automatic Dataflow Graph Partitioning
Supporting Very Large Models using Automatic Dataflow Graph Partitioning
Minjie Wang
Chien-chin Huang
Jinyang Li
35
154
0
24 Jul 2018
Beyond Data and Model Parallelism for Deep Neural Networks
Beyond Data and Model Parallelism for Deep Neural Networks
Zhihao Jia
Matei A. Zaharia
A. Aiken
GNN
AI4CE
27
496
0
14 Jul 2018
1