Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.04480
Cited By
LoCo: Low-Bit Communication Adaptor for Large-scale Model Training
5 July 2024
Xingyu Xie
Zhijie Lin
Kim-Chuan Toh
Pan Zhou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LoCo: Low-Bit Communication Adaptor for Large-scale Model Training"
8 / 8 papers shown
Title
Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation
Yaxiong Chen
Yujie Wang
Zixuan Zheng
Jingliang Hu
Yilei Shi
Shengwu Xiong
Xiao Xiang Zhu
Lichao Mou
52
0
0
18 Mar 2025
LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models
Liang Zhao
Tianwen Wei
Liang Zeng
Cheng Cheng
Liu Yang
...
Yimeng Gan
Rui Hu
Shuicheng Yan
Han Fang
Yahui Zhou
LLMAG
SyDa
28
10
0
02 Jun 2024
FP8-LM: Training FP8 Large Language Models
Houwen Peng
Kan Wu
Yixuan Wei
Guoshuai Zhao
Yuxiang Yang
...
Zheng-Wei Zhang
Shuguang Liu
Joe Chau
Han Hu
Peng Cheng
MQ
59
37
0
27 Oct 2023
ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Guanhua Wang
Heyang Qin
S. A. Jacobs
Connor Holmes
Samyam Rajbhandari
Olatunji Ruwase
Feng Yan
Lei Yang
Yuxiong He
VLM
53
55
0
16 Jun 2023
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
157
576
0
06 Apr 2023
EF21-P and Friends: Improved Theoretical Communication Complexity for Distributed Optimization with Bidirectional Compression
Kaja Gruntkowska
A. Tyurin
Peter Richtárik
36
21
0
30 Sep 2022
IntSGD: Adaptive Floatless Compression of Stochastic Gradients
Konstantin Mishchenko
Bokun Wang
D. Kovalev
Peter Richtárik
67
14
0
16 Feb 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
1