MSF: Efficient Diffusion Model Via Multi-Scale Latent Factorize

1 July 2025

Main:10 Pages

3 Figures

Bibliography:4 Pages

4 Tables

Abstract

While diffusion-based generative models have made significant strides in visual content creation, conventional approaches face computational challenges, especially for high-resolution images, as they denoise the entire image from noisy inputs. This contrasts with signal processing techniques, such as Fourier and wavelet analyses, which often employ hierarchical decompositions. Inspired by such principles, particularly the idea of signal separation, we introduce a diffusion framework leveraging multi-scale latent factorization. Our framework uniquely decomposes the denoising target, typically latent features from a pretrained Variational Autoencoder, into a low-frequency base signal capturing core structural information and a high-frequency residual signal that contributes finer, high-frequency details like textures. This decomposition into base and residual components directly informs our two-stage image generation process, which first produces the low-resolution base, followed by the generation of the high-resolution residual. Our proposed architecture facilitates reduced sampling steps during the residual learning stage, owing to the inherent ease of modeling residual information, which confers advantages over conventional full-resolution generation techniques. This specific approach of decomposing the signal into a base and a residual, conceptually akin to how wavelet analysis can separate different frequency bands, yields a more streamlined and intuitive design distinct from generic hierarchical models. Our method, \name\ (Multi-Scale Factorization), demonstrates its effectiveness by achieving FID scores of 2.08 ( $256\times256$ ) and 2.47 ( $512\times512$ ) on class-conditional ImageNet benchmarks, outperforming the DiT baseline (2.27 and 3.04 respectively) while also delivering a $4\times$ speed-up with the same number of sampling steps.

View on arXiv

@article{xu2025_2501.13349,
  title={ MSF: Efficient Diffusion Model Via Multi-Scale Latent Factorize },
  author={ Haohang Xu and Longyu Chen and Yichen Zhang and Shuangrui Ding and Zhipeng Zhang },
  journal={arXiv preprint arXiv:2501.13349},
  year={ 2025 }
}

Comments on this paper