Aggressive Post-Training Compression on Extremely Large Language Models

30 September 2024

Zining Zhang

Yao Chen

Bingsheng He

Zhenjie Zhang

ArXiv (abs)PDF HTML

Main:8 Pages

6 Figures

Bibliography:2 Pages

3 Tables

Abstract

The increasing size and complexity of Large Language Models (LLMs) pose challenges for their deployment on personal computers and mobile devices. Aggressive post-training model compression is necessary to reduce the models' size, but it often results in significant accuracy loss. To address this challenge, we propose a novel network pruning technology that utilizes over 0.7 sparsity and less than 8 bits of quantization. Our approach enables the compression of prevailing LLMs within a couple of hours while maintaining a relatively small accuracy loss. In experimental evaluations, our method demonstrates effectiveness and potential for practical deployment. By making LLMs available on domestic devices, our work can facilitate a new era of natural language processing applications with wide-ranging impacts.

View on arXiv

Comments on this paper