449
v1v2v3 (latest)

GroundingBooth: Grounding Text-to-Image Customization

Main:8 Pages
14 Figures
Bibliography:2 Pages
6 Tables
Appendix:5 Pages
Abstract

Recent approaches in text-to-image customization have primarily focused on preserving the identity of the input subject, but often fail to control the spatial location and size of objects. We introduce GroundingBooth, which achieves zero-shot, instance-level spatial grounding on both foreground subjects and background objects in the text-to-image customization task. Our proposed grounding module and subject-grounded cross-attention layer enable the creation of personalized images with accurate layout alignment, identity preservation, and strong text-image coherence. In addition, our model seamlessly supports personalization with multiple subjects. Our model shows strong results in both layout-guided image synthesis and text-to-image customization tasks. The project page is available atthis https URL.

View on arXiv
Comments on this paper