Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1803.03288
Cited By
v1
v2 (latest)
TicTac: Accelerating Distributed Deep Learning with Communication Scheduling
8 March 2018
Sayed Hadi Hashemi
Sangeetha Abdu Jyothi
R. Campbell
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"TicTac: Accelerating Distributed Deep Learning with Communication Scheduling"
50 / 54 papers shown
Title
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production
Cheng Jin
Ziheng Jiang
Zhihao Bai
Zheng Zhong
Jing Liu
...
Yanghua Peng
Xuanzhe Liu
Xuanzhe Liu
Xin Jin
Xin Liu
MoE
53
0
0
16 May 2025
Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training
Mingyu Liang
Hiwot Tadese Kassa
Wenyin Fu
Brian Coutinho
Louis Feng
Christina Delimitrou
36
0
0
12 Apr 2025
Routing for Large ML Models
Ofir Cohen
Jose Yallouz Michael Schapira
Shahar Belkar
Tal Mizrahi
73
0
0
07 Mar 2025
FLStore: Efficient Federated Learning Storage for non-training workloads
Ahmad Faraz Khan
Samuel Fountain
Ahmed M. Abdelmoniem
A. R. Butt
A. Anwar
FedML
112
0
0
01 Mar 2025
Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution
Haiquan Wang
Chaoyi Ruan
Jia He
Jiaqi Ruan
Chengjie Tang
Xiaosong Ma
Cheng-rong Li
149
1
0
24 Nov 2024
Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping
Guanhua Wang
Chengming Zhang
Zheyu Shen
Ang Li
Olatunji Ruwase
58
4
0
23 Sep 2024
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Peng Sun
141
12
0
29 Jul 2024
FedEx: Expediting Federated Learning over Heterogeneous Mobile Devices by Overlapping and Participant Selection
Jiaxiang Geng
Boyu Li
Xiaoqi Qin
Yixuan Li
Liang Li
Yanzhao Hou
Miao Pan
FedML
106
0
0
01 Jul 2024
ProTrain: Efficient LLM Training via Memory-Aware Techniques
Hanmei Yang
Jin Zhou
Yao Fu
Xiaoqun Wang
Ramine Roane
Hui Guan
Tongping Liu
VLM
83
1
0
12 Jun 2024
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Ziheng Jiang
Yanghua Peng
Yinmin Zhong
Qi Huang
Yangrui Chen
...
Zhe Li
X. Jia
Jia-jun Ye
Xin Jin
Xin Liu
LRM
124
120
0
23 Feb 2024
MLTCP: Congestion Control for DNN Training
S. Rajasekaran
Sanjoli Narang
Anton A. Zabreyko
M. Ghobadi
29
1
0
14 Feb 2024
On the Burstiness of Distributed Machine Learning Traffic
Natchanon Luangsomboon
Fahimeh Fazel
Jorg Liebeherr
A. Sobhani
Shichao Guan
Xingjun Chu
55
1
0
30 Dec 2023
Enhancing Neural Training via a Correlated Dynamics Model
Jonathan Brokman
Roy Betser
Rotem Turjeman
Tom Berkov
I. Cohen
Guy Gilboa
54
3
0
20 Dec 2023
OCCL: a Deadlock-free Library for GPU Collective Communication
Lichen Pan
Juncheng Liu
Jinhui Yuan
Rongkai Zhang
Pengze Li
Zhen Xiao
35
1
0
11 Mar 2023
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
Jaeyong Song
Jinkyu Yim
Jaewon Jung
Hongsun Jang
H. Kim
Youngsok Kim
Jinho Lee
GNN
61
27
0
24 Jan 2023
On Optimizing the Communication of Model Parallelism
Yonghao Zhuang
Hexu Zhao
Lianmin Zheng
Zhuohan Li
Eric P. Xing
Qirong Ho
Joseph E. Gonzalez
Ion Stoica
Haotong Zhang
111
26
0
10 Nov 2022
Accelerating Distributed MoE Training and Inference with Lina
Jiamin Li
Yimin Jiang
Yibo Zhu
Cong Wang
Hong-Yu Xu
MoE
76
63
0
31 Oct 2022
ByteComp: Revisiting Gradient Compression in Distributed Training
Zhuang Wang
Yanghua Peng
Yibo Zhu
T. Ng
51
2
0
28 May 2022
TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs
Weiyang Wang
Moein Khazraee
Zhizhen Zhong
M. Ghobadi
Zhihao Jia
Dheevatsa Mudigere
Ying Zhang
A. Kewitsch
120
92
0
01 Feb 2022
GC3: An Optimizing Compiler for GPU Collective Communication
M. Cowan
Saeed Maleki
Madan Musuvathi
Olli Saarikivi
Yifan Xiong
GNN
68
11
0
27 Jan 2022
Egeria: Efficient DNN Training with Knowledge-Guided Layer Freezing
Yiding Wang
D. Sun
Kai Chen
Fan Lai
Mosharaf Chowdhury
90
47
0
17 Jan 2022
Gridiron: A Technique for Augmenting Cloud Workloads with Network Bandwidth Requirements
N. Kodirov
Shane Bergsma
Syed M. Iqbal
Alan J. Hu
Ivan Beschastnikh
Margo Seltzer
25
0
0
12 Jan 2022
Automatic Configuration for Optimal Communication Scheduling in DNN Training
Yiqing Ma
Hao Wang
Yiming Zhang
Kai Chen
35
12
0
27 Dec 2021
OneFlow: Redesign the Distributed Deep Learning Framework from Scratch
Jinhui Yuan
Xinqi Li
Cheng Cheng
Juncheng Liu
Ran Guo
...
Fei Yang
Xiaodong Yi
Chuan Wu
Haoran Zhang
Jie Zhao
52
39
0
28 Oct 2021
EmbRace: Accelerating Sparse Communication for Distributed Training of NLP Neural Networks
Shengwei Li
Zhiquan Lai
Dongsheng Li
Yiming Zhang
Xiangyu Ye
Yabo Duan
FedML
56
3
0
18 Oct 2021
Scheduling Optimization Techniques for Neural Network Training
Hyungjun Oh
Junyeol Lee
HyeongJu Kim
Jiwon Seo
43
0
0
03 Oct 2021
CD-SGD: Distributed Stochastic Gradient Descent with Compression and Delay Compensation
Enda Yu
Dezun Dong
Yemao Xu
Shuo Ouyang
Xiangke Liao
40
5
0
21 Jun 2021
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
165
859
0
14 Jun 2021
A Sum-of-Ratios Multi-Dimensional-Knapsack Decomposition for DNN Resource Scheduling
Menglu Yu
Chuan Wu
Bo Ji
Jia Liu
48
9
0
28 May 2021
Towards Quantized Model Parallelism for Graph-Augmented MLPs Based on Gradient-Free ADMM Framework
Junxiang Wang
Hongyi Li
Zheng Chai
Yongchao Wang
Yue Cheng
Liang Zhao
MQ
46
3
0
20 May 2021
Towards Demystifying Serverless Machine Learning Training
Jiawei Jiang
Shaoduo Gan
Yue Liu
Fanlin Wang
Gustavo Alonso
Ana Klimovic
Ankit Singla
Wentao Wu
Ce Zhang
67
126
0
17 May 2021
Distributed Learning Systems with First-order Methods
Ji Liu
Ce Zhang
36
44
0
12 Apr 2021
CrossoverScheduler: Overlapping Multiple Distributed Training Applications in a Crossover Manner
Cheng Luo
L. Qu
Youshan Miao
Peng Cheng
Y. Xiong
38
0
0
14 Mar 2021
On the Utility of Gradient Compression in Distributed Training Systems
Saurabh Agarwal
Hongyi Wang
Shivaram Venkataraman
Dimitris Papailiopoulos
95
47
0
28 Feb 2021
DynaComm: Accelerating Distributed CNN Training between Edges and Clouds through Dynamic Communication Scheduling
Shangming Cai
Dongsheng Wang
Haixia Wang
Yongqiang Lyu
Guangquan Xu
Xi Zheng
A. Vasilakos
59
6
0
20 Jan 2021
Crossover-SGD: A gossip-based communication in distributed deep learning for alleviating large mini-batch problem and enhancing scalability
Sangho Yeo
Minho Bae
Minjoong Jeong
Oh-Kyoung Kwon
Sangyoon Oh
50
3
0
30 Dec 2020
Srifty: Swift and Thrifty Distributed Training on the Cloud
Liangchen Luo
Peter West
Arvind Krishnamurthy
Luis Ceze
51
11
0
29 Nov 2020
FPRaker: A Processing Element For Accelerating Neural Network Training
Omar Mohamed Awad
Mostafa Mahmoud
Isak Edo Vivancos
Ali Hadi Zadeh
Ciaran Bannon
Anand Jayarajan
Gennady Pekhimenko
Andreas Moshovos
84
15
0
15 Oct 2020
Garfield: System Support for Byzantine Machine Learning
R. Guerraoui
Arsany Guirguis
El-Mahdi El-Mhamdi
Anton Alexandre Ragot
Sébastien Rouault
FedML
50
2
0
12 Oct 2020
Towards a Scalable and Distributed Infrastructure for Deep Learning Applications
Bita Hasheminezhad
S. Shirzad
Nanmiao Wu
Patrick Diehl
Hannes Schulz
Hartmut Kaiser
GNN
AI4CE
85
4
0
06 Oct 2020
Synthesizing Optimal Collective Algorithms
Zixian Cai
Zhengyang Liu
Saeed Maleki
Madan Musuvathi
Todd Mytkowicz
Jacob Nelson
Olli Saarikivi
GNN
81
61
0
19 Aug 2020
Domain-specific Communication Optimization for Distributed DNN Training
Hao Wang
Jingrong Chen
Xinchen Wan
Han Tian
Jiacheng Xia
Gaoxiong Zeng
Weiyan Wang
Kai Chen
Wei Bai
Junchen Jiang
AI4CE
26
16
0
16 Aug 2020
Step-Ahead Error Feedback for Distributed Training with Compressed Gradient
An Xu
Zhouyuan Huo
Heng-Chiao Huang
64
14
0
13 Aug 2020
Analyzing and Mitigating Data Stalls in DNN Training
Jayashree Mohan
Amar Phanishayee
Ashish Raniwala
Vijay Chidambaram
84
107
0
14 Jul 2020
PyTorch Distributed: Experiences on Accelerating Data Parallel Training
Shen Li
Yanli Zhao
R. Varma
Omkar Salpekar
P. Noordhuis
...
Adam Paszke
Jeff Smith
Brian Vaughan
Pritam Damania
Soumith Chintala
OOD
MoE
69
189
0
28 Jun 2020
Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training
Hongyu Zhu
Amar Phanishayee
Gennady Pekhimenko
145
50
0
05 Jun 2020
Communication-Aware Scheduling of Precedence-Constrained Tasks on Related Machines
Yu Su
Xiaoqi Ren
Shai Vardi
Adam Wierman
11
2
0
30 Apr 2020
Communication-Efficient Distributed Deep Learning: A Comprehensive Survey
Zhenheng Tang
Shaoshuai Shi
Wei Wang
Yue Liu
Xiaowen Chu
80
49
0
10 Mar 2020
Communication optimization strategies for distributed deep neural network training: A survey
Shuo Ouyang
Dezun Dong
Yemao Xu
Liquan Xiao
116
12
0
06 Mar 2020
Throughput Prediction of Asynchronous SGD in TensorFlow
Zhuojin Li
Wumo Yan
Marco Paolieri
L. Golubchik
36
5
0
12 Nov 2019
1
2
Next