ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.14374
125
1

Scaling Deep Learning Training with MPMD Pipeline Parallelism

18 December 2024
Anxhelo Xhebraj
Sean Lee
Hanfeng Chen
Vinod Grover
    MoE
ArXiv (abs)PDFHTML
Main:11 Pages
12 Figures
Bibliography:2 Pages
1 Tables
Appendix:1 Pages
Abstract

We present JaxPP, a system for efficiently scaling the training of large deep learning models with flexible pipeline parallelism. We introduce a seamless programming model that allows implementing user-defined pipeline schedules for gradient accumulation. JaxPP automatically distributes tasks, corresponding to pipeline stages, over a cluster of nodes and automatically infers the communication among them. We implement a MPMD runtime for asynchronous execution of SPMD tasks. The pipeline parallelism implementation of JaxPP improves hardware utilization by up to 1.11×1.11\times1.11× with respect to the best performing SPMD configuration.

View on arXiv
Comments on this paper