ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.10046
  4. Cited By
TRANSOM: An Efficient Fault-Tolerant System for Training LLMs

TRANSOM: An Efficient Fault-Tolerant System for Training LLMs

16 October 2023
Baodong Wu
Lei Xia
Qingping Li
Kangyu Li
Xu Chen
Yongqiang Guo
Tieyao Xiang
Yuheng Chen
Shigang Li
ArXivPDFHTML

Papers citing "TRANSOM: An Efficient Fault-Tolerant System for Training LLMs"

4 / 4 papers shown
Title
Unicron: Economizing Self-Healing LLM Training at Scale
Unicron: Economizing Self-Healing LLM Training at Scale
Tao He
Xue Li
Zhibin Wang
Kun Qian
Jingbo Xu
Wenyuan Yu
Jingren Zhou
14
14
0
30 Dec 2023
GLM-130B: An Open Bilingual Pre-trained Model
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
245
1,071
0
05 Oct 2022
ZeRO-Offload: Democratizing Billion-Scale Model Training
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
160
413
0
18 Jan 2021
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,453
0
23 Jan 2020
1