Exploring Self-supervised Skeleton-based Action Recognition in Occluded Environments

To integrate action recognition into autonomous robotic systems, it is essential to address challenges such as person occlusions-a common yet often overlooked scenario in existing self-supervised skeleton-based action recognition methods. In this work, we propose IosPSTL, a simple and effective self-supervised learning framework designed to handle occlusions. IosPSTL combines a cluster-agnostic KNN imputer with an Occluded Partial Spatio-Temporal Learning (OPSTL) strategy. First, we pre-train the model on occluded skeleton sequences. Then, we introduce a cluster-agnostic KNN imputer that performs semantic grouping using k-means clustering on sequence embeddings. It imputes missing skeleton data by applying K-Nearest Neighbors in the latent space, leveraging nearby sample representations to restore occluded joints. This imputation generates more complete skeleton sequences, which significantly benefits downstream self-supervised models. To further enhance learning, the OPSTL module incorporates Adaptive Spatial Masking (ASM) to make better use of intact, high-quality skeleton sequences during training. Our method achieves state-of-the-art performance on the occluded versions of the NTU-60 and NTU-120 datasets, demonstrating its robustness and effectiveness under challenging conditions. Code is available atthis https URL.
View on arXiv@article{chen2025_2309.12029, title={ Exploring Self-supervised Skeleton-based Action Recognition in Occluded Environments }, author={ Yifei Chen and Kunyu Peng and Alina Roitberg and David Schneider and Jiaming Zhang and Junwei Zheng and Yufan Chen and Ruiping Liu and Kailun Yang and Rainer Stiefelhagen }, journal={arXiv preprint arXiv:2309.12029}, year={ 2025 } }