ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.15704
  4. Cited By
PyTorch Distributed: Experiences on Accelerating Data Parallel Training

PyTorch Distributed: Experiences on Accelerating Data Parallel Training

28 June 2020
Shen Li
Yanli Zhao
R. Varma
Omkar Salpekar
P. Noordhuis
Teng Li
Adam Paszke
Jeff Smith
Brian Vaughan
Pritam Damania
Soumith Chintala
    OODMoE
ArXiv (abs)PDFHTML

Papers citing "PyTorch Distributed: Experiences on Accelerating Data Parallel Training"

50 / 60 papers shown
Title
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training
Jianyi Wang
Shanchuan Lin
Zhijie Lin
Yuxi Ren
Meng Wei
...
Yang Zhao
Ceyuan Yang
Xuefeng Xiao
Chen Change Loy
Lu Jiang
DiffMVGen
125
1
0
05 Jun 2025
OmniLearn: A Framework for Distributed Deep Learning over Heterogeneous Clusters
OmniLearn: A Framework for Distributed Deep Learning over Heterogeneous Clusters
S. Tyagi
Prateek Sharma
138
0
0
21 Mar 2025
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Jared Fernandez
Luca Wehrstedt
Leonid Shamis
Mostafa Elhoushi
Kalyan Saladi
Yonatan Bisk
Emma Strubell
Jacob Kahn
553
4
0
20 Nov 2024
KDC-MAE: Knowledge Distilled Contrastive Mask Auto-Encoder
KDC-MAE: Knowledge Distilled Contrastive Mask Auto-Encoder
Maheswar Bora
Saurabh Atreya
Aritra Mukherjee
Abhijit Das
128
0
0
19 Nov 2024
Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale
  Models via Malleable Data and Model Parallelization
Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale Models via Malleable Data and Model Parallelization
Haoyang Li
Fangcheng Fu
Hao Ge
Sheng Lin
Xuanyu Wang
Jiawen Niu
Yijiao Wang
Hailin Zhang
Xiaonan Nie
Tengjiao Wang
MoMe
92
2
0
17 Oct 2024
AdaShadow: Responsive Test-time Model Adaptation in Non-stationary
  Mobile Environments
AdaShadow: Responsive Test-time Model Adaptation in Non-stationary Mobile Environments
Cheng Fang
Sicong Liu
Zimu Zhou
Bin Guo
Jiaqi Tang
Ke Ma
Zhiwen Yu
TTA
70
1
0
10 Oct 2024
Generating Origin-Destination Matrices in Neural Spatial Interaction
  Models
Generating Origin-Destination Matrices in Neural Spatial Interaction Models
Ioannis Zachos
Mark Girolami
Theodoros Damoulas
51
1
0
09 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large
  Language Models
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
60
7
0
08 Oct 2024
PhysBERT: A Text Embedding Model for Physics Scientific Literature
PhysBERT: A Text Embedding Model for Physics Scientific Literature
Thorsten Hellert
Joao Montenegro
Andrea Pollastro
PINNAI4CE
71
4
0
18 Aug 2024
On the Performance and Memory Footprint of Distributed Training: An
  Empirical Study on Transformers
On the Performance and Memory Footprint of Distributed Training: An Empirical Study on Transformers
Zhengxian Lu
Fangyu Wang
Zhiwei Xu
Fei Yang
Tao Li
70
1
0
02 Jul 2024
Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with
  1-to-K Contrastive Learning
Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning
Zhijie Nie
Richong Zhang
Zhangchi Feng
Hailang Huang
Xudong Liu
90
3
0
26 Jun 2024
Reducing Memory Contention and I/O Congestion for Disk-based GNN
  Training
Reducing Memory Contention and I/O Congestion for Disk-based GNN Training
Qisheng Jiang
Lei Jia
Chundong Wang
GNN
98
2
0
20 Jun 2024
Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory
  Utilization for Hybrid CPU-GPU Offloaded Optimizers
Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers
Avinash Maurya
Jie Ye
M. Rafique
Franck Cappello
Bogdan Nicolae
54
3
0
15 Jun 2024
DataStates-LLM: Lazy Asynchronous Checkpointing for Large Language
  Models
DataStates-LLM: Lazy Asynchronous Checkpointing for Large Language Models
Avinash Maurya
Robert Underwood
M. Rafique
Franck Cappello
Bogdan Nicolae
71
18
0
15 Jun 2024
ORBIT: Oak Ridge Base Foundation Model for Earth System Predictability
ORBIT: Oak Ridge Base Foundation Model for Earth System Predictability
Xiao Wang
A. Tsaris
Siyan Liu
Jong Youl Choi
Ming Fan
Wei Zhang
Ju Yin
M. Ashfaq
Dan Lu
Prasanna Balaprakash
81
7
0
23 Apr 2024
AntDT: A Self-Adaptive Distributed Training Framework for Leader and
  Straggler Nodes
AntDT: A Self-Adaptive Distributed Training Framework for Leader and Straggler Nodes
Youshao Xiao
Lin Ju
Zhenglei Zhou
Siyuan Li
Zhaoxin Huan
...
Rujie Jiang
Lin Wang
Xiaolu Zhang
Lei Liang
Jun Zhou
59
1
0
15 Apr 2024
Accurate Patient Alignment without Unnecessary Imaging Dose via
  Synthesizing Patient-specific 3D CT Images from 2D kV Images
Accurate Patient Alignment without Unnecessary Imaging Dose via Synthesizing Patient-specific 3D CT Images from 2D kV Images
Yuzhen Ding
J. Holmes
H. Feng
Baoxin Li
Lisa A. McGee
...
S. A. Vora
Daniel J. Ma
Robert L. Foote
Samir H. Patel
Wei Liu
41
0
0
01 Apr 2024
A Unified CPU-GPU Protocol for GNN Training
A Unified CPU-GPU Protocol for GNN Training
Yi-Chien Lin
Gangda Deng
Viktor Prasanna
GNN3DH
47
2
0
25 Mar 2024
Partitioned Neural Network Training via Synthetic Intermediate Labels
Partitioned Neural Network Training via Synthetic Intermediate Labels
C. V. Karadag
Nezih Topaloglu
103
1
0
17 Mar 2024
DeepVM: Integrating Spot and On-Demand VMs for Cost-Efficient Deep
  Learning Clusters in the Cloud
DeepVM: Integrating Spot and On-Demand VMs for Cost-Efficient Deep Learning Clusters in the Cloud
Yoochan Kim
Kihyun Kim
Yonghyeon Cho
Jinwoo Kim
Awais Khan
Ki-Dong Kang
B. An
Myung-Hoon Cha
H. Kim
Youngjae Kim
51
3
0
09 Mar 2024
Activations and Gradients Compression for Model-Parallel Training
Activations and Gradients Compression for Model-Parallel Training
Mikhail Rudakov
Aleksandr Beznosikov
Yaroslav Kholodov
Alexander Gasnikov
78
2
0
15 Jan 2024
Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable
  Tensor Collections
Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections
Marcel Wagenlander
Guo Li
Bo Zhao
Kai Zou
Peter R. Pietzuch
93
7
0
08 Dec 2023
Flexible Communication for Optimal Distributed Learning over
  Unpredictable Networks
Flexible Communication for Optimal Distributed Learning over Unpredictable Networks
S. Tyagi
Martin Swany
94
1
0
05 Dec 2023
vTrain: A Simulation Framework for Evaluating Cost-effective and
  Compute-optimal Large Language Model Training
vTrain: A Simulation Framework for Evaluating Cost-effective and Compute-optimal Large Language Model Training
Jehyeon Bang
Yujeong Choi
Myeongwoo Kim
Yongdeok Kim
Minsoo Rhu
59
18
0
27 Nov 2023
Large Language Models in Law: A Survey
Large Language Models in Law: A Survey
Jinqi Lai
Wensheng Gan
Jiayang Wu
Zhenlian Qi
Philip S. Yu
ELMAILaw
110
91
0
26 Nov 2023
Near-Linear Scaling Data Parallel Training with Overlapping-Aware
  Gradient Compression
Near-Linear Scaling Data Parallel Training with Overlapping-Aware Gradient Compression
Lin Meng
Yuzhong Sun
Weimin Li
72
1
0
08 Nov 2023
High Throughput Training of Deep Surrogates from Large Ensemble Runs
High Throughput Training of Deep Surrogates from Large Ensemble Runs
Lucas Meyer
M. Schouler
R. Caulk
Alejandro Ribés
Bruno Raffin
AI4CE
42
6
0
28 Sep 2023
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative
  Model Inference with Unstructured Sparsity
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
Haojun Xia
Zhen Zheng
Yuchao Li
Donglin Zhuang
Zhongzhu Zhou
Xiafei Qiu
Yong Li
Wei Lin
Shuaiwen Leon Song
92
14
0
19 Sep 2023
Oobleck: Resilient Distributed Training of Large Models Using Pipeline
  Templates
Oobleck: Resilient Distributed Training of Large Models Using Pipeline Templates
Insu Jang
Zhenning Yang
Zhen Zhang
Xin Jin
Mosharaf Chowdhury
MoEAI4CEOODD
107
47
0
15 Sep 2023
Breaking Boundaries: Distributed Domain Decomposition with Scalable
  Physics-Informed Neural PDE Solvers
Breaking Boundaries: Distributed Domain Decomposition with Scalable Physics-Informed Neural PDE Solvers
Arthur Feeney
Zitong Li
Ramin Bostanabad
Aparna Chandramowlishwaran
AI4CE
46
1
0
28 Aug 2023
Evaluation and Optimization of Gradient Compression for Distributed Deep
  Learning
Evaluation and Optimization of Gradient Compression for Distributed Deep Learning
Lin Zhang
Longteng Zhang
Shaoshuai Shi
Xiaowen Chu
Yue Liu
OffRL
41
7
0
15 Jun 2023
DistSim: A performance model of large-scale hybrid distributed DNN
  training
DistSim: A performance model of large-scale hybrid distributed DNN training
Guandong Lu
Run Chen
Yakai Wang
Yangjie Zhou
Rui Zhang
...
Yanming Miao
Zhifang Cai
Li-Wei Li
Jingwen Leng
Minyi Guo
84
12
0
14 Jun 2023
GraVAC: Adaptive Compression for Communication-Efficient Distributed DL
  Training
GraVAC: Adaptive Compression for Communication-Efficient Distributed DL Training
S. Tyagi
Martin Swany
62
5
0
20 May 2023
SparDL: Distributed Deep Learning Training with Efficient Sparse
  Communication
SparDL: Distributed Deep Learning Training with Efficient Sparse Communication
Minjun Zhao
Yichen Yin
Yuren Mao
Qing Liu
Lu Chen
Yunjun Gao
48
1
0
03 Apr 2023
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems
  for Large-model Training at Scale
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
William Won
Taekyung Heo
Saeed Rashidi
Srinivas Sridharan
Sudarshan Srinivasan
T. Krishna
61
51
0
24 Mar 2023
MCR-DL: Mix-and-Match Communication Runtime for Deep Learning
MCR-DL: Mix-and-Match Communication Runtime for Deep Learning
Quentin G. Anthony
A. A. Awan
Jeff Rasley
Yuxiong He
Hari Subramoni
Mustafa Abduljabbar
Hari Subramoni
D. Panda
MoE
73
7
0
15 Mar 2023
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize
  Mixture-of-Experts Training
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training
Siddharth Singh
Olatunji Ruwase
A. A. Awan
Samyam Rajbhandari
Yuxiong He
A. Bhatele
MoE
103
36
0
11 Mar 2023
DeAR: Accelerating Distributed Deep Learning with Fine-Grained
  All-Reduce Pipelining
DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining
Lin Zhang
Shaoshuai Shi
Xiaowen Chu
Wei Wang
Yue Liu
Chengjian Liu
65
11
0
24 Feb 2023
Slapo: A Schedule Language for Progressive Optimization of Large Deep
  Learning Model Training
Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training
Hongzheng Chen
Cody Hao Yu
Shuai Zheng
Zhen Zhang
Zhiru Zhang
Yida Wang
82
8
0
16 Feb 2023
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware
  Communication Compression
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
Jaeyong Song
Jinkyu Yim
Jaewon Jung
Hongsun Jang
H. Kim
Youngsok Kim
Jinho Lee
GNN
61
27
0
24 Jan 2023
Reproducible scaling laws for contrastive language-image learning
Reproducible scaling laws for contrastive language-image learning
Mehdi Cherti
Romain Beaumont
Ross Wightman
Mitchell Wortsman
Gabriel Ilharco
Cade Gordon
Christoph Schuhmann
Ludwig Schmidt
J. Jitsev
VLMCLIP
137
823
0
14 Dec 2022
Accelerating Self-Supervised Learning via Efficient Training Strategies
Accelerating Self-Supervised Learning via Efficient Training Strategies
Mustafa Taha Koccyiugit
Timothy M. Hospedales
Hakan Bilen
SSL
66
8
0
11 Dec 2022
Galvatron: Efficient Transformer Training over Multiple GPUs Using
  Automatic Parallelism
Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism
Xupeng Miao
Yujie Wang
Youhe Jiang
Chunan Shi
Xiaonan Nie
Hailin Zhang
Tengjiao Wang
GNNMoE
110
64
0
25 Nov 2022
Distributed Graph Neural Network Training: A Survey
Distributed Graph Neural Network Training: A Survey
Yingxia Shao
Hongzheng Li
Xizhi Gu
Hongbo Yin
Yawen Li
Xupeng Miao
Wentao Zhang
Tengjiao Wang
Lei Chen
GNNAI4CE
124
65
0
01 Nov 2022
PARTIME: Scalable and Parallel Processing Over Time with Deep Neural
  Networks
PARTIME: Scalable and Parallel Processing Over Time with Deep Neural Networks
Enrico Meloni
Lapo Faggi
Simone Marullo
Alessandro Betti
Matteo Tiezzi
Marco Gori
S. Melacci
GNNAI4TS
29
1
0
17 Oct 2022
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining
Wenhan Xian
Feihu Huang
Heng-Chiao Huang
FedML
54
0
0
14 Oct 2022
An Overview of the Data-Loader Landscape: Comparative Performance
  Analysis
An Overview of the Data-Loader Landscape: Comparative Performance Analysis
Iason Ofeidis
Diego Kiedanski
Leandros Tassiulas
72
7
0
27 Sep 2022
Optimizing DNN Compilation for Distributed Training with Joint OP and
  Tensor Fusion
Optimizing DNN Compilation for Distributed Training with Joint OP and Tensor Fusion
Xiaodong Yi
Shiwei Zhang
Lansong Diao
Chuan Wu
Zhen Zheng
Shiqing Fan
Siyu Wang
Jun Yang
W. Lin
67
4
0
26 Sep 2022
MLLess: Achieving Cost Efficiency in Serverless Machine Learning
  Training
MLLess: Achieving Cost Efficiency in Serverless Machine Learning Training
Pablo Gimeno Sarroca
Marc Sánchez Artigas
53
16
0
12 Jun 2022
Merak: An Efficient Distributed DNN Training Framework with Automated 3D
  Parallelism for Giant Foundation Models
Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models
Zhiquan Lai
Shengwei Li
Xudong Tang
Ke-shi Ge
Weijie Liu
Yabo Duan
Linbo Qiao
Dongsheng Li
89
46
0
10 Jun 2022
12
Next