201

H2VU-Benchmark: A Comprehensive Benchmark for Hierarchical Holistic Video Understanding

Main:8 Pages
3 Figures
Bibliography:3 Pages
2 Tables
Abstract

With the rapid development of multimodal models, the demand for assessing video understanding capabilities has been steadily increasing. However, existing benchmarks for evaluating video understanding exhibit significant limitations in coverage, task diversity, and scene adaptability. These shortcomings hinder the accurate assessment of models' comprehensive video understanding capabilities. To tackle this challenge, we propose a hierarchical and holistic video understanding (H2VU) benchmark designed to evaluate both general video and online streaming video comprehension. This benchmark contributes three key features:

View on arXiv
Comments on this paper