Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2011.03605
Cited By
Highly Available Data Parallel ML training on Mesh Networks
6 November 2020
Sameer Kumar
N. Jouppi
MoE
AI4CE
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Highly Available Data Parallel ML training on Mesh Networks"
4 / 4 papers shown
FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models
Saeed Rashidi
William Won
Sudarshan Srinivasan
Puneet Gupta
Tushar Krishna
249
4
0
28 Jun 2024
Near-Optimal Wafer-Scale Reduce
Piotr Luczynski
Lukas Gianinazzi
Patrick Iff
Leighton Wilson
Daniele De Sensi
Torsten Hoefler
517
10
0
24 Apr 2024
TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Machine Learning
Micro (MICRO), 2023
William Won
Suvinay Subramanian
Sudarshan Srinivasan
A. Durg
Samvit Kaul
Swati Gupta
Tushar Krishna
356
25
0
11 Apr 2023
On the Generalization Mystery in Deep Learning
S. Chatterjee
Piotr Zielinski
OOD
348
47
0
18 Mar 2022
1
Page 1 of 1