ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.21231
  4. Cited By
ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs

ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs

28 February 2025
Hao Ge
Junda Feng
Qi Huang
Fangcheng Fu
Xiaonan Nie
Lei Zuo
Haibin Lin
Bin Cui
Xin Liu
ArXivPDFHTML

Papers citing "ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs"

1 / 1 papers shown
Title
OVERLORD: Ultimate Scaling of DataLoader for Multi-Source Large Foundation Model Training
OVERLORD: Ultimate Scaling of DataLoader for Multi-Source Large Foundation Model Training
Juntao Zhao
Qi Lu
Wei Jia
Borui Wan
Lei Zuo
...
Y. Hu
Yanghua Peng
H. Lin
Xin Liu
Chuan Wu
AI4CE
32
0
0
14 Apr 2025
1