Accumulated Decoupled Learning: Mitigating Gradient Staleness in Inter-Layer Model Parallelization

3 December 2020

Zhiping Lin

Papers citing "Accumulated Decoupled Learning: Mitigating Gradient Staleness in Inter-Layer Model Parallelization"

3 / 3 papers shown

Title
PETRA: Parallel End-to-end Training with Reversible Architectures Stéphane Rivaud Louis Fournier Thomas Pumir Eugene Belilovsky Michael Eickenberg Edouard Oyallon 25 0 0 04 Jun 2024
Aggregated Residual Transformations for Deep Neural Networks Saining Xie Ross B. Girshick Piotr Dollár Zhuowen Tu Kaiming He 300 10,233 0 16 Nov 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 308 2,892 0 15 Sep 2016