v1v2 (latest)

Reducing Spatial Fitting Error in Distillation of Denoising Diffusion Models

AAAI Conference on Artificial Intelligence (AAAI), 2023

7 November 2023

ArXiv (abs)PDF HTML Github (4★)

Abstract

Denoising Diffusion models have exhibited remarkable capabilities in image generation. However, generating high-quality samples requires a large number of iterations. Knowledge distillation for diffusion models is an effective method to address this limitation with a shortened sampling process but causes degraded generative quality. Based on our analysis with bias-variance decomposition and experimental observations, we attribute the degradation to the spatial fitting error occurring in the training of both the teacher and student model. Accordingly, we propose $\textbf{S}$ patial $\textbf{F}$ itting- $\textbf{E}$ rror $\textbf{R}$ eduction $\textbf{D}$ istillation model ( $\textbf{SFERD}$ ). SFERD utilizes attention guidance from the teacher model and a designed semantic gradient predictor to reduce the student's fitting error. Empirically, our proposed model facilitates high-quality sample generation in a few function evaluations. We achieve an FID of 5.31 on CIFAR-10 and 9.39 on ImageNet 64 $\times$ 64 with only one step, outperforming existing diffusion methods. Our study provides a new perspective on diffusion distillation by highlighting the intrinsic denoising ability of models. Project link: \url{https://github.com/Sainzerjj/SFERD}.

View on arXiv

Comments on this paper