Interlocking Backpropagation: Improving depthwise model-parallelism

8 October 2020

Papers citing "Interlocking Backpropagation: Improving depthwise model-parallelism"

7 / 7 papers shown

Title
Asynchronous Stochastic Gradient Descent with Decoupled Backpropagation and Layer-Wise Updates Cabrel Teguemne Fokam Khaleelulla Khan Nazeer Lukas König David Kappel Anand Subramoney 35 0 0 08 Oct 2024
Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey Feng Liang Zhen Zhang Haifeng Lu Victor C. M. Leung Yanyi Guo Xiping Hu GNN 37 6 0 09 Apr 2024
Can Forward Gradient Match Backpropagation? Louis Fournier Stéphane Rivaud Eugene Belilovsky Michael Eickenberg Edouard Oyallon 19 16 0 12 Jun 2023
Backpropagation-free Training of Deep Physical Neural Networks Ali Momeni Babak Rahmani M. Malléjac Philipp del Hougne Romain Fleury AI4CE PINN 32 54 0 20 Apr 2023
Scaling Forward Gradient With Local Losses Mengye Ren Simon Kornblith Renjie Liao Geoffrey E. Hinton 81 49 0 07 Oct 2022
Training Deep Architectures Without End-to-End Backpropagation: A Survey on the Provably Optimal Methods Shiyu Duan José C. Príncipe MQ 38 3 0 09 Jan 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism M. Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 245 1,833 0 17 Sep 2019