178
v1v2 (latest)

GMem: A Modular Approach for Ultra-Efficient Generative Models

Main:9 Pages
13 Figures
Bibliography:5 Pages
7 Tables
Appendix:9 Pages
Abstract

Recent studies indicate that the denoising process in deep generative diffusion models implicitly learns and memorizes semantic information from the data distribution. These findings suggest that capturing more complex data distributions requires larger neural networks, leading to a substantial increase in computational demands, which in turn become the primary bottleneck in both training and inference of diffusion models. To this end, we introduce GMem: A Modular Approach for Ultra-Efficient Generative Models. Our approach GMem decouples the memory capacity from model and implements it as a separate, immutable memory set that preserves the essential semantic information in the data. The results are significant: GMem enhances both training, sampling efficiency, and diversity generation. This design on one hand reduces the reliance on network for memorize complex data distribution and thus enhancing both training and sampling efficiency. On ImageNet at 256×256256 \times 256 resolution, GMem achieves a 50×50\times training speedup compared to SiT, reaching FID =7.66=7.66 in fewer than 2828 epochs (4\sim 4 hours training time), while SiT requires 14001400 epochs. Without classifier-free guidance, GMem achieves state-of-the-art (SoTA) performance FID =1.53=1.53 in 160160 epochs with only 20\sim 20 hours of training, outperforming LightningDiT which requires 800800 epochs and 95\sim 95 hours to attain FID =2.17=2.17.

View on arXiv
Comments on this paper