ASurvey: Spatiotemporal Consistency in Video Generation

Video generation, by leveraging a dynamic visual generation method, pushes the boundaries of Artificial Intelligence Generated Content (AIGC). Video generation presents unique challenges beyond static image generation, requiring both high-quality individual frames and temporal coherence to maintain consistency across the spatiotemporal sequence. Recent works have aimed at addressing the spatiotemporal consistency issue in video generation, while few literature review has been organized from this perspective. This gap hinders a deeper understanding of the underlying mechanisms for high-quality video generation. In this survey, we systematically review the recent advances in video generation, covering five key aspects: foundation models, information representations, generation schemes, post-processing techniques, and evaluation metrics. We particularly focus on their contributions to maintaining spatiotemporal consistency. Finally, we discuss the future directions and challenges in this field, hoping to inspire further efforts to advance the development of video generation.
View on arXiv@article{yin2025_2502.17863, title={ ASurvey: Spatiotemporal Consistency in Video Generation }, author={ Zhiyu Yin and Kehai Chen and Xuefeng Bai and Ruili Jiang and Juntao Li and Hongdong Li and Jin Liu and Yang Xiang and Jun Yu and Min Zhang }, journal={arXiv preprint arXiv:2502.17863}, year={ 2025 } }