Shopformer: Transformer-Based Framework for Detecting Shoplifting via Human Pose

Shoplifting remains a costly issue for the retail sector, but traditional surveillance systems, which are mostly based on human monitoring, are still largely ineffective, with only about 2% of shoplifters being arrested. Existing AI-based approaches rely on pixel-level video analysis which raises privacy concerns, is sensitive to environmental variations, and demands significant computational resources. To address these limitations, we introduce Shopformer, a novel transformer-based model that detects shoplifting by analyzing pose sequences rather than raw video. We propose a custom tokenization strategy that converts pose sequences into compact embeddings for efficient transformer processing. To the best of our knowledge, this is the first pose-sequence-based transformer model for shoplifting detection. Evaluated on real-world pose data, our method outperforms state-of-the-art anomaly detection models, offering a privacy-preserving, and scalable solution for real-time retail surveillance. The code base for this work is available atthis https URL.
View on arXiv@article{rashvand2025_2504.19970, title={ Shopformer: Transformer-Based Framework for Detecting Shoplifting via Human Pose }, author={ Narges Rashvand and Ghazal Alinezhad Noghre and Armin Danesh Pazho and Babak Rahimi Ardabili and Hamed Tabkhi }, journal={arXiv preprint arXiv:2504.19970}, year={ 2025 } }