ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.05446
44
0

Stochastic Forward-Backward Deconvolution: Training Diffusion Models with Finite Noisy Datasets

8 February 2025
Haoye Lu
Qifan Wu
Yaoliang Yu
    DiffM
ArXivPDFHTML
Abstract

Recent diffusion-based generative models achieve remarkable results by training on massive datasets, yet this practice raises concerns about memorization and copyright infringement. A proposed remedy is to train exclusively on noisy data with potential copyright issues, ensuring the model never observes original content. However, through the lens of deconvolution theory, we show that although it is theoretically feasible to learn the data distribution from noisy samples, the practical challenge of collecting sufficient samples makes successful learning nearly unattainable. To overcome this limitation, we propose to pretrain the model with a small fraction of clean data to guide the deconvolution process. Combined with our Stochastic Forward--Backward Deconvolution (SFBD) method, we attain an FID of 6.316.316.31 on CIFAR-10 with just 4%4\%4% clean images (and 3.583.583.58 with 10%10\%10%). Theoretically, we prove that SFBD guides the model to learn the true data distribution. The result also highlights the importance of pretraining on limited but clean data or the alternative from similar datasets. Empirical studies further support these findings and offer additional insights.

View on arXiv
@article{lu2025_2502.05446,
  title={ Stochastic Forward-Backward Deconvolution: Training Diffusion Models with Finite Noisy Datasets },
  author={ Haoye Lu and Qifan Wu and Yaoliang Yu },
  journal={arXiv preprint arXiv:2502.05446},
  year={ 2025 }
}
Comments on this paper