ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.18547
54
0

Low-Bit Integerization of Vision Transformers using Operand Reodering for Efficient Hardware

11 April 2025
Ching-Yi Lin
Sahil Shah
    MQ
ArXivPDFHTML
Abstract

Pre-trained vision transformers have achieved remarkable performance across various visual tasks but suffer from expensive computational and memory costs. While model quantization reduces memory usage by lowering precision, these models still incur significant computational overhead due to the dequantization before matrix operations. In this work, we analyze the computation graph and propose an integerization process based on operation reordering. Specifically, the process delays dequantization until after matrix operations. This enables integerized matrix multiplication and linear module by directly processing the quantized input. To validate our approach, we synthesize the self-attention module of ViT on a systolic array-based hardware. Experimental results show that our low-bit inference reduces per-PE power consumption for linear layer and matrix multiplication, bridging the gap between quantized models and efficient inference.

View on arXiv
@article{lin2025_2504.18547,
  title={ Low-Bit Integerization of Vision Transformers using Operand Reodering for Efficient Hardware },
  author={ Ching-Yi Lin and Sahil Shah },
  journal={arXiv preprint arXiv:2504.18547},
  year={ 2025 }
}
Comments on this paper