ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.04417
18
37

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

6 October 2024
Yuan Zhang
Chun-Kai Fan
Junpeng Ma
Wenzhao Zheng
Tao Huang
Kuan Cheng
Denis A. Gudovskiy
Tomoyuki Okuno
Yohei Nakata
Kurt Keutzer
Shanghang Zhang
    VLM
ArXivPDFHTML
Abstract

In vision-language models (VLMs), visual tokens usually bear a significant amount of computational overhead despite sparsity of information in them when compared to text tokens. To address this, most existing methods learn a network to prune redundant visual tokens using certain training data. Differently, we propose a text-guided training-free token optimization mechanism dubbed SparseVLM that eliminates the need of extra parameters or fine-tuning costs. Given that visual tokens complement text tokens in VLM's linguistic reasoning, we select relevant text tokens to rate the significance of visual tokens using self-attention matrices and, then, prune visual tokens using the proposed strategy to maximize sparsity while retaining information. In particular, we introduce a rank-based strategy to adaptively determine the sparsification ratio for each layer, alongside a token recycling method that compresses pruned tokens into more compact representations. Experimental results show that SparseVLM increases the efficiency of various VLMs in a number of image and video understanding tasks. For example, LLaVA when equipped with SparseVLM achieves 54% reduction in FLOPs, 37% decrease in CUDA latency while maintaining 97% of its original accuracy. Our code is available atthis https URL.

View on arXiv
@article{zhang2025_2410.04417,
  title={ SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference },
  author={ Yuan Zhang and Chun-Kai Fan and Junpeng Ma and Wenzhao Zheng and Tao Huang and Kuan Cheng and Denis Gudovskiy and Tomoyuki Okuno and Yohei Nakata and Kurt Keutzer and Shanghang Zhang },
  journal={arXiv preprint arXiv:2410.04417},
  year={ 2025 }
}
Comments on this paper