ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.14009
  4. Cited By
SlipStream: Adapting Pipelines for Distributed Training of Large DNNs
  Amid Failures

SlipStream: Adapting Pipelines for Distributed Training of Large DNNs Amid Failures

22 May 2024
Swapnil Gandhi
Mark Zhao
Athinagoras Skiadopoulos
Christos Kozyrakis
    AI4CE
    GNN
ArXivPDFHTML

Papers citing "SlipStream: Adapting Pipelines for Distributed Training of Large DNNs Amid Failures"

5 / 5 papers shown
Title
Chimera: Efficiently Training Large-Scale Neural Networks with
  Bidirectional Pipelines
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines
Shigang Li
Torsten Hoefler
GNN
AI4CE
LRM
77
130
0
14 Jul 2021
ZeRO-Offload: Democratizing Billion-Scale Model Training
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
160
413
0
18 Jan 2021
Deep Learning Training in Facebook Data Centers: Design of Scale-up and
  Scale-out Systems
Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems
Maxim Naumov
John Kim
Dheevatsa Mudigere
Srinivas Sridharan
Xiaodong Wang
...
Krishnakumar Nair
Isabel Gao
Bor-Yiing Su
Jiyan Yang
M. Smelyanskiy
GNN
33
83
0
20 Mar 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,424
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,815
0
17 Sep 2019
1