53
v1v2 (latest)

"PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models

Jing Gu
Xian Liu
Yu Zeng
Ashwin Nagarajan
Fangrui Zhu
Daniel Hong
Yue Fan
Qianqi Yan
Kaiwen Zhou
Ming-Yu Liu
Xin Eric Wang
Main:8 Pages
21 Figures
Bibliography:4 Pages
17 Tables
Appendix:23 Pages
Abstract

Video generation models have achieved remarkable progress in creating high-quality, photorealistic content. However, their ability to accurately simulate physical phenomena remains a critical and unresolved challenge. This paper presents PhyWorldBench, a comprehensive benchmark designed to evaluate video generation models based on their adherence to the laws of physics. The benchmark covers multiple levels of physical phenomena, ranging from fundamental principles such as object motion and energy conservation to more complex scenarios involving rigid body interactions and human or animal motion. Additionally, we introduce a novel Anti-Physics category, where prompts intentionally violate real-world physics, enabling the assessment of whether models can follow such instructions while maintaining logical consistency. Besides large-scale human evaluation, we also design a simple yet effective method that utilizes current multimodal large language models to evaluate physics realism in a zero-shot fashion.We evaluate 12 state-of-the-art text-to-video generation models, including five open-source and five proprietary models, with detailed comparison and analysis. Through systematic testing across 1050 curated prompts spanning fundamental, composite, and anti-physics scenarios, we identify pivotal challenges these models face in adhering to real-world physics. We further examine their performance under diverse physical phenomena and prompt types, and derive targeted recommendations for crafting prompts that enhance fidelity to physical principles.

View on arXiv
Comments on this paper