LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis

27 March 2025

Abstract

We introduce LeX-Art, a comprehensive suite for high-quality text-image synthesis that systematically bridges the gap between prompt expressiveness and text rendering fidelity. Our approach follows a data-centric paradigm, constructing a high-quality data synthesis pipeline based on Deepseek-R1 to curate LeX-10K, a dataset of 10K high-resolution, aesthetically refined 1024 $\times$ 1024 images. Beyond dataset construction, we develop LeX-Enhancer, a robust prompt enrichment model, and train two text-to-image models, LeX-FLUX and LeX-Lumina, achieving state-of-the-art text rendering performance. To systematically evaluate visual text generation, we introduce LeX-Bench, a benchmark that assesses fidelity, aesthetics, and alignment, complemented by Pairwise Normalized Edit Distance (PNED), a novel metric for robust text accuracy evaluation. Experiments demonstrate significant improvements, with LeX-Lumina achieving a 79.81% PNED gain on CreateBench, and LeX-FLUX outperforming baselines in color (+3.18%), positional (+4.45%), and font accuracy (+3.81%). Our codes, models, datasets, and demo are publicly available.

View on arXiv

@article{zhao2025_2503.21749,
  title={ LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis },
  author={ Shitian Zhao and Qilong Wu and Xinyue Li and Bo Zhang and Ming Li and Qi Qin and Dongyang Liu and Kaipeng Zhang and Hongsheng Li and Yu Qiao and Peng Gao and Bin Fu and Zhen Li },
  journal={arXiv preprint arXiv:2503.21749},
  year={ 2025 }
}

Comments on this paper