23
0

WorldScore: A Unified Evaluation Benchmark for World Generation

Abstract

We introduce the WorldScore benchmark, the first unified benchmark for world generation. We decompose world generation into a sequence of next-scene generation tasks with explicit camera trajectory-based layout specifications, enabling unified evaluation of diverse approaches from 3D and 4D scene generation to video generation models. The WorldScore benchmark encompasses a curated dataset of 3,000 test examples that span diverse worlds: static and dynamic, indoor and outdoor, photorealistic and stylized. The WorldScore metrics evaluate generated worlds through three key aspects: controllability, quality, and dynamics. Through extensive evaluation of 19 representative models, including both open-source and closed-source ones, we reveal key insights and challenges for each category of models. Our dataset, evaluation code, and leaderboard can be found atthis https URL

View on arXiv
@article{duan2025_2504.00983,
  title={ WorldScore: A Unified Evaluation Benchmark for World Generation },
  author={ Haoyi Duan and Hong-Xing Yu and Sirui Chen and Li Fei-Fei and Jiajun Wu },
  journal={arXiv preprint arXiv:2504.00983},
  year={ 2025 }
}
Comments on this paper