ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.00329
17
1

On the Burstiness of Distributed Machine Learning Traffic

30 December 2023
Natchanon Luangsomboon
Fahimeh Fazel
Jorg Liebeherr
A. Sobhani
Shichao Guan
Xingjun Chu
ArXivPDFHTML
Abstract

Traffic from distributed training of machine learning (ML) models makes up a large and growing fraction of the traffic mix in enterprise data centers. While work on distributed ML abounds, the network traffic generated by distributed ML has received little attention. Using measurements on a testbed network, we investigate the traffic characteristics generated by the training of the ResNet-50 neural network with an emphasis on studying its short-term burstiness. For the latter we propose metrics that quantify traffic burstiness at different time scales. Our analysis reveals that distributed ML traffic exhibits a very high degree of burstiness on short time scales, exceeding a 60:1 peak-to-mean ratio on time intervals as long as 5~ms. We observe that training software orchestrates transmissions in such a way that burst transmissions from different sources within the same application do not result in congestion and packet losses. An extrapolation of the measurement data to multiple applications underscores the challenges of distributed ML traffic for congestion and flow control algorithms.

View on arXiv
Comments on this paper