44
11

Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions

Abstract

Recent advances in text-to-image (T2I) diffusion models have significantly improved the quality of generated images. However, providing efficient control over individual subjects, particularly the attributes characterizing them, remains a key challenge. While existing methods have introduced mechanisms to modulate attribute expression, they typically provide either detailed, object-specific localization of such a modification or full-scale fine-grained, nuanced control of attributes. No current approach offers both simultaneously, resulting in a gap when trying to achieve precise continuous and subject-specific attribute modulation in image generation. In this work, we demonstrate that token-level directions exist within commonly used CLIP text embeddings that enable fine-grained, subject-specific control of high-level attributes in T2I models. We introduce two methods to identify these directions: a simple, optimization-free technique and a learning-based approach that utilizes the T2I model to characterize semantic concepts more specifically. Our methods allow the augmentation of the prompt text input, enabling fine-grained control over multiple attributes of individual subjects simultaneously, without requiring any modifications to the diffusion model itself. This approach offers a unified solution that fills the gap between global and localized control, providing competitive flexibility and precision in text-guided image generation. Project page:this https URL. Code is available atthis https URL.

View on arXiv
@article{baumann2025_2403.17064,
  title={ Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions },
  author={ Stefan Andreas Baumann and Felix Krause and Michael Neumayr and Nick Stracke and Melvin Sevi and Vincent Tao Hu and Björn Ommer },
  journal={arXiv preprint arXiv:2403.17064},
  year={ 2025 }
}
Comments on this paper