VideoGen-Eval: Agent-based System for Video Generation Evaluation

30 March 2025

Abstract

The rapid advancement of video generation has rendered existing evaluation systems inadequate for assessing state-of-the-art models, primarily due to simple prompts that cannot showcase the model's capabilities, fixed evaluation operators struggling with Out-of-Distribution (OOD) cases, and misalignment between computed metrics and human preferences. To bridge the gap, we propose VideoGen-Eval, an agent evaluation system that integrates LLM-based content structuring, MLLM-based content judgment, and patch tools designed for temporal-dense dimensions, to achieve a dynamic, flexible, and expandable video generation evaluation. Additionally, we introduce a video generation benchmark to evaluate existing cutting-edge models and verify the effectiveness of our evaluation system. It comprises 700 structured, content-rich prompts (both T2V and I2V) and over 12,000 videos generated by 20+ models, among them, 8 cutting-edge models are selected as quantitative evaluation for the agent and human. Extensive experiments validate that our proposed agent-based evaluation system demonstrates strong alignment with human preferences and reliably completes the evaluation, as well as the diversity and richness of the benchmark.

View on arXiv

@article{yang2025_2503.23452,
  title={ VideoGen-Eval: Agent-based System for Video Generation Evaluation },
  author={ Yuhang Yang and Ke Fan and Shangkun Sun and Hongxiang Li and Ailing Zeng and FeiLin Han and Wei Zhai and Wei Liu and Yang Cao and Zheng-Jun Zha },
  journal={arXiv preprint arXiv:2503.23452},
  year={ 2025 }
}

Comments on this paper