Image Super-Resolution with Text Prompt Diffusion

Image super-resolution (SR) methods typically model degradation to improve reconstruction accuracy in complex and unknown degradation scenarios. However, extracting degradation information from low-resolution images is challenging, which limits the model performance. To boost image SR performance, one feasible approach is to introduce additional priors. Inspired by advancements in multi-modal methods and text prompt image processing, we introduce text prompts to image SR to provide degradation priors. Specifically, we first design a text-image generation pipeline to integrate text into the SR dataset through the text degradation representation and degradation model. By adopting a discrete design, the text representation is flexible and user-friendly. Meanwhile, we propose the PromptSR to realize the text prompt SR. The PromptSR leverages the latest multi-modal large language model (MLLM) to generate prompts from low-resolution images. It also utilizes the pre-trained language model (e.g., T5 or CLIP) to enhance text comprehension. We train the PromptSR on the text-image dataset. Extensive experiments indicate that introducing text prompts into SR, yields impressive results on both synthetic and real-world images. Code:this https URL.
View on arXiv@article{chen2025_2311.14282, title={ Image Super-Resolution with Text Prompt Diffusion }, author={ Zheng Chen and Yulun Zhang and Jinjin Gu and Xin Yuan and Linghe Kong and Guihai Chen and Xiaokang Yang }, journal={arXiv preprint arXiv:2311.14282}, year={ 2025 } }