Speeding up Deep Learning with Transient Servers

28 February 2019

Papers citing "Speeding up Deep Learning with Transient Servers"

3 / 3 papers shown

Title
Taming Resource Heterogeneity In Distributed ML Training With Dynamic Batching S. Tyagi Prateek Sharma 16 22 0 20 May 2023
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient Max Ryabinin Tim Dettmers Michael Diskin Alexander Borzunov MoE 30 31 0 27 Jan 2023
Characterizing and Modeling Distributed Training with Transient Cloud GPU Servers Shijian Li R. Walls Tian Guo 23 23 0 07 Apr 2020