Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.02868
Cited By
Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent
6 March 2023
Xiaonan Nie
Yi Liu
Fangcheng Fu
J. Xue
Dian Jiao
Xupeng Miao
Yangyu Tao
Bin Cui
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent"
10 / 10 papers shown
Title
LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing
Xiaonan Nie
Qibin Liu
Fangcheng Fu
Shenhan Zhu
Xupeng Miao
X. Li
Y. Zhang
Shouda Liu
Bin Cui
MoE
23
1
0
13 Nov 2024
Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale Models via Malleable Data and Model Parallelization
Haoyang Li
Fangcheng Fu
Hao Ge
Sheng Lin
Xuanyu Wang
Jiawen Niu
Y. Wang
Hailin Zhang
Xiaonan Nie
Bin Cui
MoMe
31
2
0
17 Oct 2024
LuWu: An End-to-End In-Network Out-of-Core Optimizer for 100B-Scale Model-in-Network Data-Parallel Training on Distributed GPUs
Mo Sun
Zihan Yang
Changyue Liao
Yingtao Li
Fei Wu
Zeke Wang
52
1
0
02 Sep 2024
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Peng Sun
71
8
0
29 Jul 2024
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
Xupeng Miao
Gabriele Oliaro
Xinhao Cheng
Vineeth Kada
Ruohan Gao
...
April Yang
Yingcheng Wang
Mengdi Wu
Colin Unger
Zhihao Jia
MoE
88
9
0
29 Feb 2024
OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning
Youhe Jiang
Fangcheng Fu
Xupeng Miao
Xiaonan Nie
Bin Cui
22
11
0
17 May 2023
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement
Xiaonan Nie
Xupeng Miao
Zilong Wang
Zichao Yang
Jilong Xue
Lingxiao Ma
Gang-Ming Cao
Bin Cui
MoE
32
44
0
08 Apr 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
308
11,909
0
04 Mar 2022
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
160
413
0
18 Jan 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,453
0
23 Jan 2020
1