Local SGD Accelerates Convergence by Exploiting Second Order Information of the Loss Function

24 May 2023

Papers citing "Local SGD Accelerates Convergence by Exploiting Second Order Information of the Loss Function"

2 / 2 papers shown

Title
EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models Jialiang Cheng Ning Gao Yun Yue Zhiling Ye Jiadi Jiang Jian Sha OffRL 77 0 0 10 Dec 2024
The Loss Surfaces of Multilayer Networks A. Choromańska Mikael Henaff Michaël Mathieu Gerard Ben Arous Yann LeCun ODL 179 1,185 0 30 Nov 2014