ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.04151
39
1

STEP: Staged Parameter-Efficient Pre-training for Large Language Models

5 April 2025
Kazuki Yano
Takumi Ito
Jun Suzuki
    LRM
ArXivPDFHTML
Abstract

Pre-training large language models (LLMs) faces significant memory challenges due to the large size of model parameters. We introduce STaged parameter-Efficient Pre-training (STEP), which integrates parameter-efficient tuning techniques with model growth. We conduct experiments on pre-training LLMs of various sizes and demonstrate that STEP achieves up to a 53.9% reduction in maximum memory requirements compared to vanilla pre-training while maintaining equivalent performance. Furthermore, we show that the model by STEP performs comparably to vanilla pre-trained models on downstream tasks after instruction tuning.

View on arXiv
@article{yano2025_2504.04151,
  title={ STEP: Staged Parameter-Efficient Pre-training for Large Language Models },
  author={ Kazuki Yano and Takumi Ito and Jun Suzuki },
  journal={arXiv preprint arXiv:2504.04151},
  year={ 2025 }
}
Comments on this paper