ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.15704
  4. Cited By
PyTorch Distributed: Experiences on Accelerating Data Parallel Training

PyTorch Distributed: Experiences on Accelerating Data Parallel Training

28 June 2020
Shen Li
Yanli Zhao
R. Varma
Omkar Salpekar
P. Noordhuis
Teng Li
Adam Paszke
Jeff Smith
Brian Vaughan
Pritam Damania
Soumith Chintala
    OODMoE
ArXiv (abs)PDFHTML

Papers citing "PyTorch Distributed: Experiences on Accelerating Data Parallel Training"

10 / 60 papers shown
Title
FuncPipe: A Pipelined Serverless Framework for Fast and Cost-efficient
  Training of Deep Learning Models
FuncPipe: A Pipelined Serverless Framework for Fast and Cost-efficient Training of Deep Learning Models
Yunzhuo Liu
Bo Jiang
Tian Guo
Zimeng Huang
Wen-ping Ma
Xinbing Wang
Chenghu Zhou
73
9
0
28 Apr 2022
PICASSO: Unleashing the Potential of GPU-centric Training for
  Wide-and-deep Recommender Systems
PICASSO: Unleashing the Potential of GPU-centric Training for Wide-and-deep Recommender Systems
Yuanxing Zhang
Langshi Chen
Siran Yang
Man Yuan
Hui-juan Yi
...
Yong Li
Dingyang Zhang
Wei Lin
Lin Qu
Bo Zheng
78
32
0
11 Apr 2022
HeterPS: Distributed Deep Learning With Reinforcement Learning Based
  Scheduling in Heterogeneous Environments
HeterPS: Distributed Deep Learning With Reinforcement Learning Based Scheduling in Heterogeneous Environments
Ji Liu
Zhihua Wu
Dianhai Yu
Yanjun Ma
Danlei Feng
Minxu Zhang
Xinxuan Wu
Xuefeng Yao
Dejing Dou
76
49
0
20 Nov 2021
Graph Neural Network Training with Data Tiering
Graph Neural Network Training with Data Tiering
S. Min
Kun Wu
Mert Hidayetoğlu
Jinjun Xiong
Xiang Song
Wen-mei W. Hwu
GNN
58
16
0
10 Nov 2021
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel
  Training
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training
Yongbin Li
Hongxin Liu
Zhengda Bian
Boxiang Wang
Haichen Huang
Fan Cui
Chuan-Qing Wang
Yang You
GNN
111
149
0
28 Oct 2021
Looper: An end-to-end ML platform for product decisions
Looper: An end-to-end ML platform for product decisions
I. Markov
Hanson Wang
Nitya Kasturi
Shaun Singh
Szeto Wai Yuen
...
Michael Belkin
Sal Uryasev
Sam Howie
E. Bakshy
Norm Zhou
OffRL
107
15
0
14 Oct 2021
Software-Hardware Co-design for Fast and Scalable Training of Deep
  Learning Recommendation Models
Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models
Dheevatsa Mudigere
Y. Hao
Jianyu Huang
Zhihao Jia
Andrew Tulloch
...
Ajit Mathews
Lin Qiao
M. Smelyanskiy
Bill Jia
Vijay Rao
109
154
0
12 Apr 2021
CrossoverScheduler: Overlapping Multiple Distributed Training
  Applications in a Crossover Manner
CrossoverScheduler: Overlapping Multiple Distributed Training Applications in a Crossover Manner
Cheng Luo
L. Qu
Youshan Miao
Peng Cheng
Y. Xiong
41
0
0
14 Mar 2021
Taming Momentum in a Distributed Asynchronous Environment
Taming Momentum in a Distributed Asynchronous Environment
Ido Hakimi
Saar Barkai
Moshe Gabel
Assaf Schuster
93
23
0
26 Jul 2019
Koji: Automating pipelines with mixed-semantics data sources
Koji: Automating pipelines with mixed-semantics data sources
P. Maymounkov
29
6
0
02 Dec 2018
Previous
12