Disentanglement in T-space for Faster and Distributed Training of Diffusion Models with Fewer Latent-states

20 August 2025

Main:8 Pages

23 Figures

Bibliography:2 Pages

1 Tables

Appendix:19 Pages

Abstract

We challenge a fundamental assumption of diffusion models, namely, that a large number of latent-states or time-steps is required for training so that the reverse generative process is close to a Gaussian. We first show that with careful selection of a noise schedule, diffusion models trained over a small number of latent states (i.e. $T \sim 32$ ) match the performance of models trained over a much large number of latent states ( $T \sim 1,000$ ). Second, we push this limit (on the minimum number of latent states required) to a single latent-state, which we refer to as complete disentanglement in T-space. We show that high quality samples can be easily generated by the disentangled model obtained by combining several independently trained single latent-state models. We provide extensive experiments to show that the proposed disentangled model provides 4-6 $\times$ faster convergence measured across a variety of metrics on two different datasets.

View on arXiv

Comments on this paper