ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.05979
  4. Cited By
Performance Modeling and Evaluation of Distributed Deep Learning
  Frameworks on GPUs

Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs

16 November 2017
S. Shi
Qiang-qiang Wang
Xiaowen Chu
ArXivPDFHTML

Papers citing "Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs"

17 / 17 papers shown
Title
FedSlate:A Federated Deep Reinforcement Learning Recommender System
FedSlate:A Federated Deep Reinforcement Learning Recommender System
Yongxin Deng
Xihe Qiu
Xiaoyu Tan
Yaochu Jin
FedML
88
0
0
23 Sep 2024
A Generic Performance Model for Deep Learning in a Distributed
  Environment
A Generic Performance Model for Deep Learning in a Distributed Environment
Tulasi Kavarakuntla
Liangxiu Han
H. Lloyd
Annabel Latham
Anthony Kleerekoper
S. Akintoye
16
0
0
19 May 2023
Byzantine Fault Tolerance in Distributed Machine Learning : a Survey
Byzantine Fault Tolerance in Distributed Machine Learning : a Survey
Djamila Bouhata
Hamouma Moumen
Moumen Hamouma
Ahcène Bounceur
AI4CE
23
7
0
05 May 2022
Accelerating Distributed K-FAC with Smart Parallelism of Computing and
  Communication Tasks
Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks
S. Shi
Lin Zhang
Bo-wen Li
24
9
0
14 Jul 2021
ResIST: Layer-Wise Decomposition of ResNets for Distributed Training
ResIST: Layer-Wise Decomposition of ResNets for Distributed Training
Chen Dun
Cameron R. Wolfe
C. Jermaine
Anastasios Kyrillidis
16
21
0
02 Jul 2021
Distributed Machine Learning for Wireless Communication Networks:
  Techniques, Architectures, and Applications
Distributed Machine Learning for Wireless Communication Networks: Techniques, Architectures, and Applications
Shuyan Hu
Xiaojing Chen
Wei Ni
E. Hossain
Xin Wang
AI4CE
34
111
0
02 Dec 2020
Characterizing and Modeling Distributed Training with Transient Cloud
  GPU Servers
Characterizing and Modeling Distributed Training with Transient Cloud GPU Servers
Shijian Li
R. Walls
Tian Guo
15
23
0
07 Apr 2020
Communication-Efficient Decentralized Learning with Sparsification and
  Adaptive Peer Selection
Communication-Efficient Decentralized Learning with Sparsification and Adaptive Peer Selection
Zhenheng Tang
S. Shi
X. Chu
FedML
13
57
0
22 Feb 2020
MG-WFBP: Merging Gradients Wisely for Efficient Communication in
  Distributed Deep Learning
MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning
S. Shi
X. Chu
Bo Li
FedML
20
25
0
18 Dec 2019
On the Discrepancy between the Theoretical Analysis and Practical
  Implementations of Compressed Communication for Distributed Deep Learning
On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning
Aritra Dutta
El Houcine Bergou
A. Abdelmoniem
Chen-Yu Ho
Atal Narayan Sahu
Marco Canini
Panos Kalnis
19
76
0
19 Nov 2019
Throughput Prediction of Asynchronous SGD in TensorFlow
Throughput Prediction of Asynchronous SGD in TensorFlow
Zhuojin Li
Wumo Yan
Marco Paolieri
L. Golubchik
11
5
0
12 Nov 2019
Characterizing Deep Learning Training Workloads on Alibaba-PAI
Characterizing Deep Learning Training Workloads on Alibaba-PAI
Mengdi Wang
Chen Meng
Guoping Long
Chuan Wu
Jun Yang
Wei Lin
Yangqing Jia
17
53
0
14 Oct 2019
Optimizing Multi-GPU Parallelization Strategies for Deep Learning
  Training
Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training
Saptadeep Pal
Eiman Ebrahimi
A. Zulfiqar
Yaosheng Fu
Victor Zhang
Szymon Migacz
D. Nellans
Puneet Gupta
21
55
0
30 Jul 2019
Priority-based Parameter Propagation for Distributed DNN Training
Priority-based Parameter Propagation for Distributed DNN Training
Anand Jayarajan
Jinliang Wei
Garth A. Gibson
Alexandra Fedorova
Gennady Pekhimenko
AI4CE
11
178
0
10 May 2019
A Distributed Synchronous SGD Algorithm with Global Top-$k$
  Sparsification for Low Bandwidth Networks
A Distributed Synchronous SGD Algorithm with Global Top-kkk Sparsification for Low Bandwidth Networks
S. Shi
Qiang-qiang Wang
Kaiyong Zhao
Zhenheng Tang
Yuxin Wang
Xiang Huang
Xiaowen Chu
32
134
0
14 Jan 2019
Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks
Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks
Soojeong Kim
Gyeong-In Yu
Hojin Park
Sungwoo Cho
Eunji Jeong
Hyeonmin Ha
Sanha Lee
Joo Seong Jeong
Byung-Gon Chun
15
73
0
08 Aug 2018
Stochastic Nonconvex Optimization with Large Minibatches
Stochastic Nonconvex Optimization with Large Minibatches
Weiran Wang
Nathan Srebro
34
26
0
25 Sep 2017
1