TIPO: Text to Image with Text Presampling for Prompt Optimization

TIPO (Text-to-Image Prompt Optimization) introduces an efficient approach for automatic prompt refinement in text-to-image (T2I) generation. Starting from simple user prompts, TIPO leverages a lightweight pre-trained model to expand these prompts into richer, detailed versions. Conceptually, TIPO samples refined prompts from a targeted sub-distribution within the broader semantic space, preserving the original intent while significantly improving visual quality, coherence, and detail. Unlike resource-intensive methods based on large language models (LLMs) or reinforcement learning (RL), TIPO provides computational efficiency and scalability, opening new possibilities for effective, automated prompt engineering in T2I tasks.We provide visual results, human preference report to investigate TIPO's effectiveness. Experimental evaluations on benchmark datasets demonstrate substantial improvements in aesthetic quality, significant reduction of visual artifacts, and enhanced alignment with target distributions along with significant human preference proficiency. These results highlight the importance of targeted prompt engineering in text-to-image tasks and indicate broader opportunities for automated prompt refinement.
View on arXiv@article{yeh2025_2411.08127, title={ TIPO: Text to Image with Text Presampling for Prompt Optimization }, author={ Shih-Ying Yeh and Sang-Hyun Park and Yi Li and Giyeong Oh and Xuehai Wang and Min Song and Youngjae Yu }, journal={arXiv preprint arXiv:2411.08127}, year={ 2025 } }