ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.06635
98
0

Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM

10 February 2025
Qingshui Gu
Shu Li
Tianyu Zheng
Zhaoxiang Zhang
ArXivPDFHTML
Abstract

Steel-LLM is a Chinese-centric language model developed from scratch with the goal of creating a high-quality, open-source model despite limited computational resources. Launched in March 2024, the project aimed to train a 1-billion-parameter model on a large-scale dataset, prioritizing transparency and the sharing of practical insights to assist others in the community. The training process primarily focused on Chinese data, with a small proportion of English data included, addressing gaps in existing open-source LLMs by providing a more detailed and practical account of the model-building journey. Steel-LLM has demonstrated competitive performance on benchmarks such as CEVAL and CMMLU, outperforming early models from larger institutions. This paper provides a comprehensive summary of the project's key contributions, including data collection, model design, training methodologies, and the challenges encountered along the way, offering a valuable resource for researchers and practitioners looking to develop their own LLMs. The model checkpoints and training script are available atthis https URL.

View on arXiv
@article{gu2025_2502.06635,
  title={ Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM },
  author={ Qingshui Gu and Shu Li and Tianyu Zheng and Zhaoxiang Zhang },
  journal={arXiv preprint arXiv:2502.06635},
  year={ 2025 }
}
Comments on this paper