Large language models (LLMs) are revolutionizing many science and engineering fields. However, their huge model sizes impose extremely demanding needs of computational resources in the pre-training stage. Although low-rank factorizations can reduce model parameters, their direct application in LLM pre-training often lead to non-negligible performance loss. To address this fundamental challenge, we introduce CoLA and its memory-efficient implementation, CoLA-M. We leverage the low-rank structure observed widely in model activations, enforcing non-linear transformations between factorized weight matrices to reduce model size, boost model capacity and training efficiency. Experiments on LLaMA models with 60 million to 7 billion parameters show that CoLA reduces the computing cost by and improves training throughput by while maintaining full-rank level performance. CoLA-M further squeezes memory cost without sacrificing throughput, offering a pre-training approach with collectively superior parameter, computing, and memory efficiency. The LLMs produced are also smaller, enabling faster inference with lower memory cost on resource-constrained platforms
View on arXiv@article{liu2025_2502.10940, title={ CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation }, author={ Ziyue Liu and Ruijie Zhang and Zhengyang Wang and Zi Yang and Paul Hovland and Bogdan Nicolae and Franck Cappello and Zheng Zhang }, journal={arXiv preprint arXiv:2502.10940}, year={ 2025 } }