Tell Me What You See: Text-Guided Real-World Image Denoising

Image reconstruction from noisy sensor measurements is challenging and many methods have been proposed for it. Yet, most approaches focus on learning robust natural image priors while modeling the scene's noise statistics. In extremely low-light conditions, these methods often remain insufficient. Additional information is needed, such as multiple captures or, as suggested here, scene description. As an alternative, we propose using a text-based description of the scene as an additional prior, something the photographer can easily provide. Inspired by the remarkable success of text-guided diffusion models in image generation, we show that adding image caption information significantly improves image denoising and reconstruction for both synthetic and real-world images.
View on arXiv@article{yosef2025_2312.10191, title={ Tell Me What You See: Text-Guided Real-World Image Denoising }, author={ Erez Yosef and Raja Giryes }, journal={arXiv preprint arXiv:2312.10191}, year={ 2025 } }