Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2011.03605
Cited By
Highly Available Data Parallel ML training on Mesh Networks
6 November 2020
Sameer Kumar
N. Jouppi
MoE
AI4CE
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Highly Available Data Parallel ML training on Mesh Networks"
4 / 4 papers shown
Title
FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models
Saeed Rashidi
William Won
Sudarshan Srinivasan
Puneet Gupta
Tushar Krishna
59
2
0
28 Jun 2024
Near-Optimal Wafer-Scale Reduce
Piotr Luczynski
Lukas Gianinazzi
Patrick Iff
Leighton Wilson
Daniele De Sensi
Torsten Hoefler
123
5
0
24 Apr 2024
TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Machine Learning
William Won
Suvinay Subramanian
Sudarshan Srinivasan
A. Durg
Samvit Kaul
Swati Gupta
Tushar Krishna
92
7
0
11 Apr 2023
On the Generalization Mystery in Deep Learning
S. Chatterjee
Piotr Zielinski
OOD
65
35
0
18 Mar 2022
1