NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models

20 April 2025

Abstract

Large language models (LLMs) exhibit remarkable performance across various natural language processing tasks but suffer from immense computational and memory demands, limiting their deployment in resource-constrained environments. To address this challenge, we propose NoWag: (Normalized Weight and Activation Guided Compression), a unified framework for zero-shot shape preserving compression algorithms. We compressed Llama-2 7B/13B/70B and Llama-3 8/70BB models, using two popular forms of shape-preserving compression, vector quantization NoWag-VQ (NoWag for Vector Quantization), and unstructured/semi-structured pruning NoWag-P (NoWag for Pruning). We found that NoWag-VQ significantly outperforms state-of-the-art zero shot VQ, and that NoWag-P performs competitively against state-of-the-art methods. These results suggest commonalities between these compression paradigms that could inspire future work. Our code is available atthis https URL

View on arXiv

@article{liu2025_2504.14569,
  title={ NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models },
  author={ Lawrence Liu and Inesh Chakrabarti and Yixiao Li and Mengdi Wang and Tuo Zhao and Lin F. Yang },
  journal={arXiv preprint arXiv:2504.14569},
  year={ 2025 }
}

Comments on this paper