While vision transformers achieve significant breakthroughs in various image restoration (IR) tasks, it is still challenging to efficiently scale them across multiple types of degradations and resolutions. In this paper, we propose Fractal-IR, a fractal-based design that progressively refines degraded images by repeatedly expanding local information into broader regions. This fractal architecture naturally captures local details at early stages and seamlessly transitions toward global context in deeper fractal stages, removing the need for computationally heavy long-range self-attention mechanisms. Moveover, we observe the challenge in scaling up vision transformers for IR tasks. Through a series of analyses, we identify a holistic set of strategies to effectively guide model scaling. Extensive experimental results show that Fractal-IR achieves state-of-the-art performance in seven common image restoration tasks, including super-resolution, denoising, JPEG artifact removal, IR in adverse weather conditions, motion deblurring, defocus deblurring, and demosaicking. For SR on Manga109, Fractal-IR achieves a 0.21 dB PSNR gain. For grayscale image denoising on Urban100, Fractal-IR surpasses the previous method by 0.2 dB for .
View on arXiv@article{li2025_2503.17825, title={ Fractal-IR: A Unified Framework for Efficient and Scalable Image Restoration }, author={ Yawei Li and Bin Ren and Jingyun Liang and Rakesh Ranjan and Mengyuan Liu and Nicu Sebe and Ming-Hsuan Yang and Luca Benini }, journal={arXiv preprint arXiv:2503.17825}, year={ 2025 } }