ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.02540
66
2

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

4 June 2024
Tianchen Zhao
Tongcheng Fang
Haofeng Huang
Enshu Liu
Widyadewi Soedarmadji
Shiyao Li
Zinan Lin
Guohao Dai
Shengen Yan
Huazhong Yang
Xuefei Ning
Xuefei Ning
Yu Wang
    MQ
    VGen
ArXivPDFHTML
Abstract

Diffusion transformers have demonstrated remarkable performance in visual generation tasks, such as generating realistic images or videos based on textual instructions. However, larger model sizes and multi-frame processing for video generation lead to increased computational and memory costs, posing challenges for practical deployment on edge devices. Post-Training Quantization (PTQ) is an effective method for reducing memory costs and computational complexity. When quantizing diffusion transformers, we find that existing quantization methods face challenges when applied to text-to-image and video tasks. To address these challenges, we begin by systematically analyzing the source of quantization error and conclude with the unique challenges posed by DiT quantization. Accordingly, we design an improved quantization scheme: ViDiT-Q (Video & Image Diffusion Transformer Quantization), tailored specifically for DiT models. We validate the effectiveness of ViDiT-Q across a variety of text-to-image and video models, achieving W8A8 and W4A8 with negligible degradation in visual quality and metrics. Additionally, we implement efficient GPU kernels to achieve practical 2-2.5x memory saving and a 1.4-1.7x end-to-end latency speedup.

View on arXiv
@article{zhao2025_2406.02540,
  title={ ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation },
  author={ Tianchen Zhao and Tongcheng Fang and Haofeng Huang and Enshu Liu and Rui Wan and Widyadewi Soedarmadji and Shiyao Li and Zinan Lin and Guohao Dai and Shengen Yan and Huazhong Yang and Xuefei Ning and Yu Wang },
  journal={arXiv preprint arXiv:2406.02540},
  year={ 2025 }
}
Comments on this paper