13

Diffusion Probe: Generated Image Result Prediction Using CNN Probes

Benlei Cui
Bukun Huang
Zhizeng Ye
Xuemei Dong
Tuo Chen
Hui Xue
Dingkang Yang
Longtao Huang
Jingqun Tang
Haiwen Hong
Main:9 Pages
10 Figures
Bibliography:2 Pages
8 Tables
Appendix:8 Pages
Abstract

Text-to-image (T2I) diffusion models lack an efficient mechanism for early quality assessment, leading to costly trial-and-error in multi-generation scenarios such as prompt iteration, agent-based generation, and flow-grpo. We reveal a strong correlation between early diffusion cross-attention distributions and final image quality. Based on this finding, we introduce Diffusion Probe, a framework that leverages internal cross-attention maps as predictive signals.

View on arXiv
Comments on this paper