Unsupervised Video Anomaly Detection via Normalizing Flows with Implicit
Latent Features
Surveillance anomaly detection searches for anomalous events, such as crimes or accidents, among normal scenes. Because it occurs rarely, most training data consists of unlabeled, normal videos, which makes the task challenging. Most existing methods use an autoencoder (AE) to learn reconstructing normal videos and detect anomalies by a failure to reconstruct the appearance of abnormal scenes. However, because anomalies are distinguished by appearance or motion, many previous approaches have explicitly separated appearance and motion information--for example, using a pre-trained optical flow model. This explicit separation restricts reciprocal representation capabilities between two information. In contrast, we propose an implicit two-path AE (ITAE), a structure in which two encoders implicitly model appearance and motion features, and a single decoder that combines them to learn normal video patterns. For the complex distribution of normal scenes, we suggest normal density estimation of ITAE features through normalizing flow (NF)-based generative models to learn the tractable likelihoods and find anomalies using out-of-distribution detection. NF models intensify ITAE performance by learning normality through implicitly learned features. Finally, we demonstrate the effectiveness of ITAE and its feature distribution modeling in three benchmarks, especially on the Shanghai Tech Campus (ST) database composed of various anomalies in real-world scenarios.
View on arXiv