Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.00433
Cited By
TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs
1 February 2022
Weiyang Wang
Moein Khazraee
Zhizhen Zhong
M. Ghobadi
Zhihao Jia
Dheevatsa Mudigere
Ying Zhang
A. Kewitsch
Re-assign community
ArXiv
PDF
HTML
Papers citing
"TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs"
6 / 6 papers shown
Title
Towards Easy and Realistic Network Infrastructure Testing for Large-scale Machine Learning
Jinsun Yoo
ChonLam Lao
Lianjie Cao
Bob Lantz
Minlan Yu
Tushar Krishna
Puneet Sharma
52
0
0
29 Apr 2025
MOPAR: A Model Partitioning Framework for Deep Learning Inference Services on Serverless Platforms
Jiaang Duan
Shiyou Qian
Dingyu Yang
Hanwen Hu
Jian Cao
Guangtao Xue
MoE
37
1
0
03 Apr 2024
Efficient All-to-All Collective Communication Schedules for Direct-Connect Topologies
P. Basu
Liangyu Zhao
Jason Fantl
Siddharth Pal
Arvind Krishnamurthy
J. Khoury
30
7
0
24 Sep 2023
Efficient Direct-Connect Topologies for Collective Communications
Liangyu Zhao
Siddharth Pal
Tapan Chugh
Weiyang Wang
Jason Fantl
P. Basu
J. Khoury
Arvind Krishnamurthy
22
6
0
07 Feb 2022
Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems
Maxim Naumov
John Kim
Dheevatsa Mudigere
Srinivas Sridharan
Xiaodong Wang
...
Krishnakumar Nair
Isabel Gao
Bor-Yiing Su
Jiyan Yang
M. Smelyanskiy
GNN
41
83
0
20 Mar 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,821
0
17 Sep 2019
1