Boosting Diffusion-Based Text Image Super-Resolution Model Towards Generalized Real-World Scenarios

10 March 2025

Abstract

Restoring low-resolution text images presents a significant challenge, as it requires maintaining both the fidelity and stylistic realism of the text in restored images. Existing text image restoration methods often fall short in hard situations, as the traditional super-resolution models cannot guarantee clarity, while diffusion-based methods fail to maintain fidelity. In this paper, we introduce a novel framework aimed at improving the generalization ability of diffusion models for text image super-resolution (SR), especially promoting fidelity. First, we propose a progressive data sampling strategy that incorporates diverse image types at different stages of training, stabilizing the convergence and improving the generalization. For the network architecture, we leverage a pre-trained SR prior to provide robust spatial reasoning capabilities, enhancing the model's ability to preserve textual information. Additionally, we employ a cross-attention mechanism to better integrate textual priors. To further reduce errors in textual priors, we utilize confidence scores to dynamically adjust the importance of textual features during training. Extensive experiments on real-world datasets demonstrate that our approach not only produces text images with more realistic visual appearances but also improves the accuracy of text structure.

View on arXiv

@article{pan2025_2503.07232,
  title={ Boosting Diffusion-Based Text Image Super-Resolution Model Towards Generalized Real-World Scenarios },
  author={ Chenglu Pan and Xiaogang Xu and Ganggui Ding and Yunke Zhang and Wenbo Li and Jiarong Xu and Qingbiao Wu },
  journal={arXiv preprint arXiv:2503.07232},
  year={ 2025 }
}

Comments on this paper