LPViT: Low-Power Semi-structured Pruning for Vision Transformers

2 July 2024

Kaixin Xu

Zhe Wang

Min Wu

Xiaoli Li

Weisi Lin

ViT

VLM

ArXiv PDF HTML

Abstract

Vision transformers have emerged as a promising alternative to convolutional neural networks for various image analysis tasks, offering comparable or superior performance. However, one significant drawback of ViTs is their resource-intensive nature, leading to increased memory footprint, computation complexity, and power consumption. To democratize this high-performance technology and make it more environmentally friendly, it is essential to compress ViT models, reducing their resource requirements while maintaining high performance. In this paper, we introduce a new block-structured pruning to address the resource-intensive issue for ViTs, offering a balanced trade-off between accuracy and hardware acceleration. Unlike unstructured pruning or channel-wise structured pruning, block pruning leverages the block-wise structure of linear layers, resulting in more efficient matrix multiplications. To optimize this pruning scheme, our paper proposes a novel hardware-aware learning objective that simultaneously maximizes speedup and minimizes power consumption during inference, tailored to the block sparsity structure. This objective eliminates the need for empirical look-up tables and focuses solely on reducing parametrized layer connections. Moreover, our paper provides a lightweight algorithm to achieve post-training pruning for ViTs, utilizing second-order Taylor approximation and empirical optimization to solve the proposed hardware-aware objective. Extensive experiments on ImageNet are conducted across various ViT architectures, including DeiT-B and DeiT-S, demonstrating competitive performance with other pruning methods and achieving a remarkable balance between accuracy preservation and power savings. Especially, we achieve 3.93x speedup on dedicated hardware and GPUs respectively for DeiT-B, and a power reduction by 1.4x on GPUs. Code released tothis https URL.

View on arXiv

@article{xu2025_2407.02068,
  title={ LPViT: Low-Power Semi-structured Pruning for Vision Transformers },
  author={ Kaixin Xu and Zhe Wang and Chunyun Chen and Xue Geng and Jie Lin and Mohamed M. Sabry Aly and Xulei Yang and Min Wu and Xiaoli Li and Weisi Lin },
  journal={arXiv preprint arXiv:2407.02068},
  year={ 2025 }
}

Comments on this paper