SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow

13 April 2025

Abstract

Recent prompt-based image editing models have demonstrated impressive prompt-following capability at structural editing tasks. However, existing models still fail to perform local edits, follow detailed editing prompts, or maintain global image quality beyond a single editing step. To address these challenges, we introduce SPICE, a training-free workflow that accepts arbitrary resolutions and aspect ratios, accurately follows user requirements, and improves image quality consistently during more than 100 editing steps. By synergizing the strengths of a base diffusion model and a Canny edge ControlNet model, SPICE robustly handles free-form editing instructions from the user. SPICE outperforms state-of-the-art baselines on a challenging realistic image-editing dataset consisting of semantic editing (object addition, removal, replacement, and background change), stylistic editing (texture changes), and structural editing (action change) tasks. Not only does SPICE achieve the highest quantitative performance according to standard evaluation metrics, but it is also consistently preferred by users over existing image-editing methods. We release the workflow implementation for popular diffusion model Web UIs to support further research and artistic exploration.

View on arXiv

@article{tang2025_2504.09697,
  title={ SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow },
  author={ Kenan Tang and Yanhong Li and Yao Qin },
  journal={arXiv preprint arXiv:2504.09697},
  year={ 2025 }
}

Comments on this paper