ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.10205
57
0

AWP: Activation-Aware Weight Pruning and Quantization with Projected Gradient Descent

11 June 2025
Jing Liu
T. Koike-Akino
Ye Wang
Hassan Mansour
Matthew Brand
    MQ
ArXiv (abs)PDFHTML
Main:4 Pages
2 Figures
Bibliography:3 Pages
5 Tables
Appendix:2 Pages
Abstract

To address the enormous size of Large Language Models (LLMs), model compression methods, such as quantization and pruning, are often deployed, especially on edge devices. In this work, we focus on layer-wise post-training quantization and pruning. Drawing connections between activation-aware weight pruning and sparse approximation problems, and motivated by the success of Iterative Hard Thresholding (IHT), we propose a unified method for Activation-aware Weight pruning and quantization via Projected gradient descent (AWP). Our experiments demonstrate that AWP outperforms state-of-the-art LLM pruning and quantization methods. Theoretical convergence guarantees of the proposed method for pruning are also provided.

View on arXiv
@article{liu2025_2506.10205,
  title={ AWP: Activation-Aware Weight Pruning and Quantization with Projected Gradient Descent },
  author={ Jing Liu and Toshiaki Koike-Akino and Ye Wang and Hassan Mansour and Matthew Brand },
  journal={arXiv preprint arXiv:2506.10205},
  year={ 2025 }
}
Comments on this paper