ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.19324
21
6

Simpler Diffusion (SiD2): 1.5 FID on ImageNet512 with pixel-space diffusion

25 October 2024
Emiel Hoogeboom
Thomas Mensink
Jonathan Heek
Kay Lamerigts
Ruiqi Gao
Tim Salimans
ArXivPDFHTML
Abstract

Latent diffusion models have become the popular choice for scaling up diffusion models for high resolution image synthesis. Compared to pixel-space models that are trained end-to-end, latent models are perceived to be more efficient and to produce higher image quality at high resolution. Here we challenge these notions, and show that pixel-space models can be very competitive to latent models both in quality and efficiency, achieving 1.5 FID on ImageNet512 and new SOTA results on ImageNet128, ImageNet256 and Kinetics600.We present a simple recipe for scaling end-to-end pixel-space diffusion models to high resolutions. 1: Use the sigmoid loss-weighting (Kingma & Gao, 2023) with our prescribed hyper-parameters. 2: Use our simplified memory-efficient architecture with fewer skip-connections. 3: Scale the model to favor processing the image at a high resolution with fewer parameters, rather than using more parameters at a lower resolution. Combining these with guidance intervals, we obtain a family of pixel-space diffusion models we call Simpler Diffusion (SiD2).

View on arXiv
@article{hoogeboom2025_2410.19324,
  title={ Simpler Diffusion (SiD2): 1.5 FID on ImageNet512 with pixel-space diffusion },
  author={ Emiel Hoogeboom and Thomas Mensink and Jonathan Heek and Kay Lamerigts and Ruiqi Gao and Tim Salimans },
  journal={arXiv preprint arXiv:2410.19324},
  year={ 2025 }
}
Comments on this paper