CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation

16 February 2025

Abstract

Large language models (LLMs) are revolutionizing many science and engineering fields. However, their huge model sizes impose extremely demanding needs of computational resources in the pre-training stage. Although low-rank factorizations can reduce model parameters, their direct application in LLM pre-training often lead to non-negligible performance loss. To address this fundamental challenge, we introduce CoLA and its memory-efficient implementation, CoLA-M. We leverage the low-rank structure observed widely in model activations, enforcing non-linear transformations between factorized weight matrices to reduce model size, boost model capacity and training efficiency. Experiments on LLaMA models with 60 million to 7 billion parameters show that CoLA reduces the computing cost by $\bf 2\pmb{\times}$ and improves training throughput by $\bf 1.86\pmb{\times}$ while maintaining full-rank level performance. CoLA-M further squeezes memory cost without sacrificing throughput, offering a pre-training approach with collectively superior parameter, computing, and memory efficiency. The LLMs produced are also $\bf 2\pmb{\times}$ smaller, enabling faster inference with lower memory cost on resource-constrained platforms

View on arXiv

@article{liu2025_2502.10940,
  title={ CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation },
  author={ Ziyue Liu and Ruijie Zhang and Zhengyang Wang and Zi Yang and Paul Hovland and Bogdan Nicolae and Franck Cappello and Zheng Zhang },
  journal={arXiv preprint arXiv:2502.10940},
  year={ 2025 }
}

Comments on this paper