ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.03852
22
21

FLM-101B: An Open LLM and How to Train It with 100KBudget100K Budget100KBudget

7 September 2023
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Xuying Meng
Siqi Fan
Peng Han
Jing Li
LI DU
Bowen Qin
Zheng-Wei Zhang
Aixin Sun
Yequan Wang
ArXivPDFHTML
Abstract

Large language models (LLMs) are considered important approaches towards foundational machine intelligence, achieving remarkable success in Natural Language Processing and multimodal tasks, among others. However, the carbon footprints and financial costs originating from heavy pre-training computation is a non-negligible issue. Progressive training methods, inspired by the neurogenesis process that grows neural structures, have shown potential to accelerate LLM pre-training. However, the algorithms, implementation, and practices for progressively training LLMs beyond 100B parameters remain underexplored. In this paper, we show that our model, namely FLM-101B, trained with our growth strategy under a budget of \100K,reaches80%ofthebaselines′performanceswithonly10%oftheirfloating−pointoperations.WebelievethatfurtherstudiesonprogressivetrainingwillbenefitthecommunitybycuttingdownthecostsandpromotinggreenAI.ThecheckpointofFLM−101BisreleasedatthishttpsURL.100K, reaches 80\% of the baselines' performances with only 10\% of their floating-point operations. We believe that further studies on progressive training will benefit the community by cutting down the costs and promoting green AI. The checkpoint of FLM-101B is released atthis https URL.100K,reaches80%ofthebaselines′performanceswithonly10%oftheirfloating−pointoperations.WebelievethatfurtherstudiesonprogressivetrainingwillbenefitthecommunitybycuttingdownthecostsandpromotinggreenAI.ThecheckpointofFLM−101BisreleasedatthishttpsURL.

View on arXiv
Comments on this paper