ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1801.08030
  4. Cited By
On Scale-out Deep Learning Training for Cloud and HPC

On Scale-out Deep Learning Training for Cloud and HPC

24 January 2018
Srinivas Sridharan
K. Vaidyanathan
Dhiraj D. Kalamkar
Dipankar Das
Mikhail E. Smorkalov
Mikhail Shiryaev
Dheevatsa Mudigere
Naveen Mellempudi
Sasikanth Avancha
Bharat Kaul
Pradeep Dubey
    BDL
ArXiv (abs)PDFHTML

Papers citing "On Scale-out Deep Learning Training for Cloud and HPC"

8 / 8 papers shown
Deep Learning Models on CPUs: A Methodology for Efficient Training
Deep Learning Models on CPUs: A Methodology for Efficient Training
Quchen Fu
Ramesh Chukka
Keith Achorn
Thomas Atta-fosu
Deepak R. Canchi
Zhongwei Teng
Jules White
Douglas C. Schmidt
213
3
0
20 Jun 2022
Optimizing Deep Learning Recommender Systems' Training On CPU Cluster
  Architectures
Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Dhiraj D. Kalamkar
E. Georganas
Sudarshan Srinivasan
Jianping Chen
Mikhail Shiryaev
A. Heinecke
247
52
0
10 May 2020
Optimizing Multi-GPU Parallelization Strategies for Deep Learning
  Training
Optimizing Multi-GPU Parallelization Strategies for Deep Learning TrainingIEEE Micro (IEEE Micro), 2019
Saptadeep Pal
Eiman Ebrahimi
A. Zulfiqar
Yaosheng Fu
Victor Zhang
Szymon Migacz
D. Nellans
Puneet Gupta
314
69
0
30 Jul 2019
High-Performance Deep Learning via a Single Building Block
High-Performance Deep Learning via a Single Building Block
E. Georganas
K. Banerjee
Dhiraj D. Kalamkar
Sasikanth Avancha
Anand Venkat
Michael J. Anderson
G. Henry
Hans Pabst
A. Heinecke
200
12
0
15 Jun 2019
Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep
  Learning
Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning
Youngeun Kwon
Minsoo Rhu
174
64
0
18 Feb 2019
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI:
  Characterization, Designs, and Performance Evaluation
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation
A. A. Awan
Jeroen Bédorf
Ching-Hsiang Chu
Hari Subramoni
D. Panda
GNN
158
49
0
25 Oct 2018
Anatomy Of High-Performance Deep Learning Convolutions On SIMD
  Architectures
Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures
E. Georganas
Sasikanth Avancha
K. Banerjee
Dhiraj D. Kalamkar
G. Henry
Hans Pabst
A. Heinecke
BDL
209
112
0
16 Aug 2018
TicTac: Accelerating Distributed Deep Learning with Communication
  Scheduling
TicTac: Accelerating Distributed Deep Learning with Communication Scheduling
Sayed Hadi Hashemi
Sangeetha Abdu Jyothi
R. Campbell
257
205
0
08 Mar 2018
1
Page 1 of 1