WorldGenBench: A World-Knowledge-Integrated Benchmark for Reasoning-Driven Text-to-Image Generation

2 May 2025

Abstract

Recent advances in text-to-image (T2I) generation have achieved impressive results, yet existing models still struggle with prompts that require rich world knowledge and implicit reasoning: both of which are critical for producing semantically accurate, coherent, and contextually appropriate images in real-world scenarios. To address this gap, we introduce \textbf{WorldGenBench}, a benchmark designed to systematically evaluate T2I models' world knowledge grounding and implicit inferential capabilities, covering both the humanities and nature domains. We propose the \textbf{Knowledge Checklist Score}, a structured metric that measures how well generated images satisfy key semantic expectations. Experiments across 21 state-of-the-art models reveal that while diffusion models lead among open-source methods, proprietary auto-regressive models like GPT-4o exhibit significantly stronger reasoning and knowledge integration. Our findings highlight the need for deeper understanding and inference capabilities in next-generation T2I systems. Project Page: \href{this https URL}{this https URL}

View on arXiv

@article{zhang2025_2505.01490,
  title={ WorldGenBench: A World-Knowledge-Integrated Benchmark for Reasoning-Driven Text-to-Image Generation },
  author={ Daoan Zhang and Che Jiang and Ruoshi Xu and Biaoxiang Chen and Zijian Jin and Yutian Lu and Jianguo Zhang and Liang Yong and Jiebo Luo and Shengda Luo },
  journal={arXiv preprint arXiv:2505.01490},
  year={ 2025 }
}

Comments on this paper