ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.09135
  4. Cited By
Asynchronous Local-SGD Training for Language Modeling

Asynchronous Local-SGD Training for Language Modeling

17 January 2024
Bo Liu
Rachita Chhaparia
Arthur Douillard
Satyen Kale
Andrei A. Rusu
Jiajun Shen
Arthur Szlam
MarcÁurelio Ranzato
    FedML
ArXivPDFHTML

Papers citing "Asynchronous Local-SGD Training for Language Modeling"

5 / 5 papers shown
Title
Nesterov Method for Asynchronous Pipeline Parallel Optimization
Nesterov Method for Asynchronous Pipeline Parallel Optimization
Thalaiyasingam Ajanthan
Sameera Ramasinghe
Yan Zuo
Gil Avraham
Alexander Long
12
0
0
02 May 2025
Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training
Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training
Hiroki Naganuma
Xinzhi Zhang
Man-Chung Yue
Ioannis Mitliagkas
Philipp A. Witte
Russell J. Hewett
Yin Tat Lee
63
0
0
25 Apr 2025
EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
Jialiang Cheng
Ning Gao
Yun Yue
Zhiling Ye
Jiadi Jiang
Jian Sha
OffRL
72
0
0
10 Dec 2024
No Need to Talk: Asynchronous Mixture of Language Models
No Need to Talk: Asynchronous Mixture of Language Models
Anastasiia Filippova
Angelos Katharopoulos
David Grangier
Ronan Collobert
MoE
33
0
0
04 Oct 2024
DiLoCo: Distributed Low-Communication Training of Language Models
DiLoCo: Distributed Low-Communication Training of Language Models
Arthur Douillard
Qixuang Feng
Andrei A. Rusu
Rachita Chhaparia
Yani Donchev
A. Kuncoro
MarcÁurelio Ranzato
Arthur Szlam
Jiajun Shen
53
31
0
14 Nov 2023
1