Hardware-Friendly Static Quantization Method for Video Diffusion Transformers

20 February 2025

Abstract

Diffusion Transformers for video generation have gained significant research interest since the impressive performance of SORA. Efficient deployment of such generative-AI models on GPUs has been demonstrated with dynamic quantization. However, resource-constrained devices cannot support dynamic quantization, and need static quantization of the models for their efficient deployment on AI processors. In this paper, we propose a novel method for the post-training quantization of OpenSora\cite{opensora}, a Video Diffusion Transformer, without relying on dynamic quantization techniques. Our approach employs static quantization, achieving video quality comparable to FP16 and dynamically quantized ViDiT-Q methods, as measured by CLIP, and VQA metrics. In particular, we utilize per-step calibration data to adequately provide a post-training statically quantized model for each time step, incorporating channel-wise quantization for weights and tensor-wise quantization for activations. By further applying the smooth-quantization technique, we can obtain high-quality video outputs with the statically quantized models. Extensive experimental results demonstrate that static quantization can be a viable alternative to dynamic quantization for video diffusion transformers, offering a more efficient approach without sacrificing performance.

View on arXiv

@article{yi2025_2502.15077,
  title={ Hardware-Friendly Static Quantization Method for Video Diffusion Transformers },
  author={ Sanghyun Yi and Qingfeng Liu and Mostafa El-Khamy },
  journal={arXiv preprint arXiv:2502.15077},
  year={ 2025 }
}

Comments on this paper