10
35

USTEP: Spatio-Temporal Predictive Learning under A Unified View

Abstract

Spatio-temporal predictive learning plays a crucial role in self-supervised learning, with wide-ranging applications across a diverse range of fields. Previous approaches for temporal modeling fall into two categories: recurrent-based and recurrent-free methods. The former, while meticulously processing frames one by one, neglect short-term spatio-temporal information redundancies, leading to inefficiencies. The latter naively stack frames sequentially, overlooking the inherent temporal dependencies. In this paper, we re-examine the two dominant temporal modeling approaches within the realm of spatio-temporal predictive learning, offering a unified perspective. Building upon this analysis, we introduce USTEP (Unified Spatio-TEmporal Predictive learning), an innovative framework that reconciles the recurrent-based and recurrent-free methods by integrating both micro-temporal and macro-temporal scales. Extensive experiments on a wide range of spatio-temporal predictive learning demonstrate that USTEP achieves significant improvements over existing temporal modeling approaches, thereby establishing it as a robust solution for a wide range of spatio-temporal applications.

View on arXiv
@article{tan2025_2310.05829,
  title={ USTEP: Spatio-Temporal Predictive Learning under A Unified View },
  author={ Cheng Tan and Jue Wang and Zhangyang Gao and Siyuan Li and Stan Z. Li },
  journal={arXiv preprint arXiv:2310.05829},
  year={ 2025 }
}
Comments on this paper