Recently, dataset condensation has made significant progress in the image domain. Unlike images, videos possess an additional temporal dimension, which harbors considerable redundant information, making condensation even more crucial. However, video dataset condensation still remains an underexplored area. We aim to bridge this gap by providing a large-scale study with systematic design and fair comparison. Specifically, our work delves into three key aspects to provide valuable empirical insights: (1) temporal processing of video data, (2) the evaluation protocol for video dataset condensation, and (3) adaptation of condensation algorithms to the space-time domain. From this study, we derive several intriguing observations: (i) labeling methods greatly influence condensation performance, (ii) simple sliding-window sampling is effective for temporal processing, and (iii) dataset distillation methods perform better in challenging scenarios, while sample selection methods excel in easier ones. Furthermore, we propose a unified evaluation protocol for the fair comparison of different condensation algorithms and achieve state-of-the-art results on four widely-used action recognition datasets: HMDB51, UCF101, SSv2 and K400. Our code is available atthis https URL.
View on arXiv@article{chen2025_2412.21197, title={ A Large-Scale Study on Video Action Dataset Condensation }, author={ Yang Chen and Sheng Guo and Bo Zheng and Limin Wang }, journal={arXiv preprint arXiv:2412.21197}, year={ 2025 } }