ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.16265
  4. Cited By
CO2: Efficient Distributed Training with Full Communication-Computation
  Overlap

CO2: Efficient Distributed Training with Full Communication-Computation Overlap

29 January 2024
Weigao Sun
Zhen Qin
Weixuan Sun
Shidi Li
Dong Li
Xuyang Shen
Yu Qiao
Yiran Zhong
    OffRL
ArXivPDFHTML

Papers citing "CO2: Efficient Distributed Training with Full Communication-Computation Overlap"

9 / 9 papers shown
Title
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Xiaoye Qu
Yafu Li
Zhaochen Su
Weigao Sun
Jianhao Yan
...
Chaochao Lu
Yue Zhang
Xian-Sheng Hua
Bowen Zhou
Yu Cheng
ReLM
OffRL
LRM
80
11
0
27 Mar 2025
MoM: Linear Sequence Modeling with Mixture-of-Memories
MoM: Linear Sequence Modeling with Mixture-of-Memories
Jusen Du
Weigao Sun
Disen Lan
Jiaxi Hu
Yu-Xi Cheng
KELM
75
3
0
19 Feb 2025
EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
Jialiang Cheng
Ning Gao
Yun Yue
Zhiling Ye
Jiadi Jiang
Jian Sha
OffRL
72
0
0
10 Dec 2024
Distributed Sign Momentum with Local Steps for Training Transformers
Distributed Sign Momentum with Local Steps for Training Transformers
Shuhua Yu
Ding Zhou
Cong Xie
An Xu
Zhi-Li Zhang
Xin Liu
S. Kar
59
0
0
26 Nov 2024
ACCO: Accumulate while you Communicate, Hiding Communications in
  Distributed LLM Training
ACCO: Accumulate while you Communicate, Hiding Communications in Distributed LLM Training
Adel Nabli
Louis Fournier
Pierre Erbacher
Louis Serrano
Eugene Belilovsky
Edouard Oyallon
FedML
30
0
0
03 Jun 2024
Linear Attention Sequence Parallelism
Linear Attention Sequence Parallelism
Weigao Sun
Zhen Qin
Dong Li
Xuyang Shen
Yu Qiao
Yiran Zhong
68
2
0
03 Apr 2024
MS-Net: A Multi-Path Sparse Model for Motion Prediction in Multi-Scenes
MS-Net: A Multi-Path Sparse Model for Motion Prediction in Multi-Scenes
Xiaqiang Tang
Weigao Sun
Siyuan Hu
Yiyang Sun
Yafeng Guo
35
4
0
01 Mar 2024
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence
  Lengths in Large Language Models
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Zhen Qin
Weigao Sun
Dong Li
Xuyang Shen
Weixuan Sun
Yiran Zhong
62
21
0
09 Jan 2024
Feature Pyramid Networks for Object Detection
Feature Pyramid Networks for Object Detection
Tsung-Yi Lin
Piotr Dollár
Ross B. Girshick
Kaiming He
Bharath Hariharan
Serge J. Belongie
ObjD
166
21,643
0
09 Dec 2016
1