52

Disentanglement in T-space for Faster and Distributed Training of Diffusion Models with Fewer Latent-states

Main:8 Pages
23 Figures
Bibliography:2 Pages
1 Tables
Appendix:19 Pages
Abstract

We challenge a fundamental assumption of diffusion models, namely, that a large number of latent-states or time-steps is required for training so that the reverse generative process is close to a Gaussian. We first show that with careful selection of a noise schedule, diffusion models trained over a small number of latent states (i.e. T32T \sim 32) match the performance of models trained over a much large number of latent states (T1,000T \sim 1,000). Second, we push this limit (on the minimum number of latent states required) to a single latent-state, which we refer to as complete disentanglement in T-space. We show that high quality samples can be easily generated by the disentangled model obtained by combining several independently trained single latent-state models. We provide extensive experiments to show that the proposed disentangled model provides 4-6×\times faster convergence measured across a variety of metrics on two different datasets.

View on arXiv
Comments on this paper