Enhance Multi-Scale Spatial-Temporal Coherence for Configurable Video Anomaly Detection
The development of unsupervised Video Anomaly Detection (VAD) relies on technologies in the field of signal processing. Since the anomaly is quite ambiguous and unbounded, different detection demands may often be raised even in one scenario. Thus, we propose to design the configurable VAD with flexible solutions targeting to solve the issue that previous methods have to train their models from scratch and waste resources when detection demands even change slightly. Moreover, we also design a dataset with good compatibility to evaluate the VAD performance when changes happen in detection demands. Besides, videos contain important information regarding continuous changes in the object's appearance and motion. Thus, we also propose a module to establish the multi-scale spatial-temporal coherence, which improves the accuracy and has the ability to dynamically adjust and accurately capture spatial-temporal normal patterns. Experiments show that our method not only models coherence effectively but also has better configurable ability.
View on arXiv