ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.05346
19
0

Thanos: A Block-wise Pruning Algorithm for Efficient Large Language Model Compression

6 April 2025
Ivan Ilin
Peter Richtárik
ArXivPDFHTML
Abstract

This paper presents Thanos, a novel weight-pruning algorithm designed to reduce the memory footprint and enhance the computational efficiency of large language models (LLMs) by removing redundant weights while maintaining accuracy. Thanos introduces a block-wise pruning strategy with adaptive masks that dynamically adjust to weight importance, enabling flexible sparsity patterns and structured formats, such as n:mn:mn:m sparsity, optimized for hardware acceleration. Experimental evaluations demonstrate that Thanos achieves state-of-the-art performance in structured pruning and outperforms existing methods in unstructured pruning. By providing an efficient and adaptable approach to model compression, Thanos offers a practical solution for deploying large models in resource-constrained environments.

View on arXiv
@article{ilin2025_2504.05346,
  title={ Thanos: A Block-wise Pruning Algorithm for Efficient Large Language Model Compression },
  author={ Ivan Ilin and Peter Richtarik },
  journal={arXiv preprint arXiv:2504.05346},
  year={ 2025 }
}
Comments on this paper