Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1901.02067
Cited By
v1
v2 (latest)
HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array
7 January 2019
Linghao Song
Jiachen Mao
Youwei Zhuo
Xuehai Qian
Hai Helen Li
Yiran Chen
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array"
21 / 21 papers shown
Title
Distributed Deep Learning using Stochastic Gradient Staleness
Viet Hoang Pham
Hyo-Sung Ahn
44
0
0
06 Sep 2025
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Yang Liu
299
28
0
29 Jul 2024
HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis
European Conference on Computer Systems (EuroSys), 2024
Shiwei Zhang
Lansong Diao
Chuan Wu
Zongyan Cao
Siyu Wang
Jialin Li
174
23
0
11 Jan 2024
DEAP: Design Space Exploration for DNN Accelerator Parallelism
Ekansh Agrawal
Xiangyu Sam Xu
180
1
0
24 Dec 2023
A Survey From Distributed Machine Learning to Distributed Deep Learning
Mohammad Dehghani
Zahra Yazdanparast
260
0
0
11 Jul 2023
KAPLA: Pragmatic Representation and Fast Solving of Scalable NN Accelerator Dataflow
Zhiyao Li
Mingyu Gao
103
1
0
09 Jun 2023
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2023
Jaeyong Song
Jinkyu Yim
Jaewon Jung
Hongsun Jang
H. Kim
Youngsok Kim
Jinho Lee
GNN
226
35
0
24 Jan 2023
Demystifying Map Space Exploration for NPUs
IEEE International Symposium on Workload Characterization (IISWC), 2022
Sheng-Chun Kao
A. Parashar
Po-An Tsai
T. Krishna
209
12
0
07 Oct 2022
Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design
Micro (MICRO), 2022
Hongxiang Fan
Thomas C. P. Chau
Stylianos I. Venieris
Royson Lee
Alexandros Kouris
Wayne Luk
Nicholas D. Lane
Mohamed S. Abdelfattah
143
78
0
20 Sep 2022
Layer-Wise Partitioning and Merging for Efficient and Scalable Deep Learning
Social Science Research Network (SSRN), 2022
S. Akintoye
Liangxiu Han
H. Lloyd
Xin Zhang
Darren Dancey
Haoming Chen
Daoqiang Zhang
FedML
149
5
0
22 Jul 2022
Special Session: Towards an Agile Design Methodology for Efficient, Reliable, and Secure ML Systems
IEEE VLSI Test Symposium (VTS), 2022
Shail Dave
Alberto Marchisio
Muhammad Abdullah Hanif
Amira Guesmi
Aviral Shrivastava
Ihsen Alouani
Mohamed Bennai
184
14
0
18 Apr 2022
A Survey and Empirical Evaluation of Parallel Deep Learning Frameworks
Daniel Nichols
Siddharth Singh
Shuqing Lin
A. Bhatele
OOD
163
10
0
09 Nov 2021
Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving
Qiyu Wan
Haojun Xia
Xingyao Zhang
Lening Wang
Shuaiwen Leon Song
Xin Fu
OOD
102
10
0
07 Oct 2021
Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication
Symposium on Field Programmable Gate Arrays (FPGA), 2021
Linghao Song
Yuze Chi
Atefeh Sohrabizadeh
Young-kyu Choi
Jason Lau
Jason Cong
GNN
220
83
0
22 Sep 2021
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
Sheng-Chun Kao
Suvinay Subramanian
Gaurav Agrawal
Amir Yazdanbakhsh
T. Krishna
326
82
0
13 Jul 2021
GPTPU: Accelerating Applications using Edge Tensor Processing Units
International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2021
Kuan-Chieh Hsu
Hung-Wei Tseng
181
32
0
22 Jun 2021
RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance
Micro (MICRO), 2021
Udit Gupta
Samuel Hsia
J. Zhang
Mark Wilkening
Javin Pombra
Hsien-Hsin S. Lee
Gu-Yeon Wei
Carole-Jean Wu
David Brooks
203
33
0
18 May 2021
A Hybrid Parallelization Approach for Distributed and Scalable Deep Learning
IEEE Access (IEEE Access), 2021
S. Akintoye
Liangxiu Han
Xin Zhang
Haoming Chen
Daoqiang Zhang
196
18
0
11 Apr 2021
FPRaker: A Processing Element For Accelerating Neural Network Training
Omar Mohamed Awad
Mostafa Mahmoud
Isak Edo Vivancos
Ali Hadi Zadeh
Ciaran Bannon
Anand Jayarajan
Gennady Pekhimenko
Andreas Moshovos
149
15
0
15 Oct 2020
Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights
Shail Dave
Riyadh Baghdadi
Tony Nowatzki
Sasikanth Avancha
Aviral Shrivastava
Baoxin Li
263
98
0
02 Jul 2020
Non-Structured DNN Weight Pruning -- Is It Beneficial in Any Platform?
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2019
Xiaolong Ma
Sheng Lin
Shaokai Ye
Zhezhi He
Linfeng Zhang
...
Deliang Fan
Xuehai Qian
Xinyu Lin
Kaisheng Ma
Yanzhi Wang
MQ
328
108
0
03 Jul 2019
1