44
0

DiffDoctor: Diagnosing Image Diffusion Models Before Treating

Abstract

In spite of recent progress, image diffusion models still produce artifacts. A common solution is to leverage the feedback provided by quality assessment systems or human annotators to optimize the model, where images are generally rated in their entirety. In this work, we believe problem-solving starts with identification, yielding the request that the model should be aware of not just the presence of defects in an image, but their specific locations. Motivated by this, we propose DiffDoctor, a two-stage pipeline to assist image diffusion models in generating fewer artifacts. Concretely, the first stage targets developing a robust artifact detector, for which we collect a dataset of over 1M flawed synthesized images and set up an efficient human-in-the-loop annotation process, incorporating a carefully designed class-balance strategy. The learned artifact detector is then involved in the second stage to optimize the diffusion model by providing pixel-level feedback. Extensive experiments on text-to-image diffusion models demonstrate the effectiveness of our artifact detector as well as the soundness of our diagnose-then-treat design.

View on arXiv
@article{wang2025_2501.12382,
  title={ DiffDoctor: Diagnosing Image Diffusion Models Before Treating },
  author={ Yiyang Wang and Xi Chen and Xiaogang Xu and Sihui Ji and Yu Liu and Yujun Shen and Hengshuang Zhao },
  journal={arXiv preprint arXiv:2501.12382},
  year={ 2025 }
}
Comments on this paper