ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.03605
  4. Cited By
Highly Available Data Parallel ML training on Mesh Networks

Highly Available Data Parallel ML training on Mesh Networks

6 November 2020
Sameer Kumar
N. Jouppi
    MoEAI4CE
ArXiv (abs)PDFHTML

Papers citing "Highly Available Data Parallel ML training on Mesh Networks"

4 / 4 papers shown
FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models
FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models
Saeed Rashidi
William Won
Sudarshan Srinivasan
Puneet Gupta
Tushar Krishna
249
4
0
28 Jun 2024
Near-Optimal Wafer-Scale Reduce
Near-Optimal Wafer-Scale Reduce
Piotr Luczynski
Lukas Gianinazzi
Patrick Iff
Leighton Wilson
Daniele De Sensi
Torsten Hoefler
517
10
0
24 Apr 2024
TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed
  Machine Learning
TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Machine LearningMicro (MICRO), 2023
William Won
Suvinay Subramanian
Sudarshan Srinivasan
A. Durg
Samvit Kaul
Swati Gupta
Tushar Krishna
356
25
0
11 Apr 2023
On the Generalization Mystery in Deep Learning
On the Generalization Mystery in Deep Learning
S. Chatterjee
Piotr Zielinski
OOD
348
47
0
18 Mar 2022
1
Page 1 of 1