Priority-based Parameter Propagation for Distributed DNN Training

Priority-based Parameter Propagation for Distributed DNN Training

10 May 2019

Anand Jayarajan

Garth A. Gibson

Alexandra Fedorova

Gennady Pekhimenko

Papers citing "Priority-based Parameter Propagation for Distributed DNN Training"

12 / 12 papers shown

Title
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression Jaeyong Song Jinkyu Yim Jaewon Jung Hongsun Jang H. Kim Youngsok Kim Jinho Lee GNN 14 25 0 24 Jan 2023
Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices Max Ryabinin Eduard A. Gorbunov Vsevolod Plokhotnyuk Gennady Pekhimenko 27 31 0 04 Mar 2021
DynaComm: Accelerating Distributed CNN Training between Edges and Clouds through Dynamic Communication Scheduling Shangming Cai Dongsheng Wang Haixia Wang Yongqiang Lyu Guangquan Xu Xi Zheng A. Vasilakos 24 6 0 20 Jan 2021
Synthesizing Optimal Collective Algorithms Zixian Cai Zhengyang Liu Saeed Maleki Madan Musuvathi Todd Mytkowicz Jacob Nelson Olli Saarikivi GNN 15 59 0 19 Aug 2020
DAPPLE: A Pipelined Data Parallel Approach for Training Large Models Shiqing Fan Yi Rong Chen Meng Zongyan Cao Siyu Wang ... Jun Yang Lixue Xia Lansong Diao Xiaoyong Liu Wei Lin 21 231 0 02 Jul 2020
HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism Jay H. Park Gyeongchan Yun Chang Yi N. T. Nguyen Seungmin Lee Jaesik Choi S. Noh Young-ri Choi MoE 14 128 0 28 May 2020
Communication optimization strategies for distributed deep neural network training: A survey Shuo Ouyang Dezun Dong Yemao Xu Liquan Xiao 17 12 0 06 Mar 2020
Methods and Experiences for Developing Abstractions for Data-intensive, Scientific Applications André Luckow S. Jha 12 0 0 20 Feb 2020
DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters Yanghua Peng Yixin Bao Yangrui Chen Chuan Wu Chen Meng Wei Lin 11 79 0 13 Sep 2019
Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training Bojian Zheng Abhishek Tiwari Nandita Vijaykumar Gennady Pekhimenko 19 44 0 22 May 2018
Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training Liang Luo Jacob Nelson Luis Ceze Amar Phanishayee Arvind Krishnamurthy 64 120 0 21 May 2018
Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability J. Keuper Franz-Josef Pfreundt GNN 47 97 0 22 Sep 2016