Revolutionizing Large Language Model Training through Dynamic Parameter Adjustment

3 June 2024

Kaiye Zhou

Shucheng Wang

ArXiv (abs)PDF HTML Github

Main:11 Pages

11 Figures

Bibliography:4 Pages

12 Tables

Appendix:9 Pages

Abstract

In the era of large language models, the demand for efficient use of computational resources has become critically important. Although parameter-efficient fine-tuning techniques have achieved results comparable to full fine-tuning, their application during the pre-training phase poses significant challenges. Specifically, employing parameter-efficient strategies at the onset of pre-training can severely compromise efficiency, especially in larger models. In this paper, building upon the fine-tuning method LoRA, we introduce a novel parameter-efficient training technique that frequently alters trainable part of parameters, facilitating effective pre-training. Our method not only achieves memory reductions and computational overhead comparable to current state-of-the-art parameter-efficient algorithms during the pre-training phase but also maintains accuracy levels comparable to those of full pre-training. We provide both theoretical analyses and empirical evidence to demonstrate the effectiveness of our approach.

View on arXiv

Comments on this paper