ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.05352
16
0

Achieving binary weight and activation for LLMs using Post-Training Quantization

7 April 2025
Siqing Song
Chuang Wang
Ruiqi Wang
Yi Yang
Xuyao Zhang
    MQ
ArXivPDFHTML
Abstract

Quantizing large language models (LLMs) to 1-bit precision significantly reduces computational costs, but existing quantization techniques suffer from noticeable performance degradation when using weight and activation precisions below 4 bits (W4A4). In this paper, we propose a post-training quantization framework with W(1+1)A(1*4) configuration, where weights are quantized to 1 bit with an additional 1 bit for fine-grain grouping and activations are quantized to 1 bit with a 4-fold increase in the number of channels. For weight quantization, we propose utilizing Hessian-aware fine-grained grouping along with an EM-based quantization scheme. For activation quantization, we decompose INT4-quantized activations into a 4 * INT1 format equivalently and simultaneously smooth the scaling factors based on quantization errors, which further reduces the quantization errors in activations. Our method surpasses state-of-the-art (SOTA) LLM quantization baselines on W2A4 across multiple tasks, pushing the boundaries of existing LLM quantization methods toward fully binarized models.

View on arXiv
@article{song2025_2504.05352,
  title={ Achieving binary weight and activation for LLMs using Post-Training Quantization },
  author={ Siqing Song and Chuang Wang and Ruiqi Wang and Yi Yang and Xuyao Zhang },
  journal={arXiv preprint arXiv:2504.05352},
  year={ 2025 }
}
Comments on this paper