Comparison of Anomaly Detectors: Context Matters

Deep generative models are challenging the classical methods in the field of anomaly detection nowadays. Every new method provides evidence of outperforming its predecessors, often with contradictory results. The objective of this comparison is twofold: to compare anomaly detection methods of various paradigms with focus on deep generative models, and identification of sources of variability that can yield different results. The methods were compared on popular tabular and image datasets. We identified the main sources of variability to be experimental conditions: i) the type data set (tabular or image) and the nature of anomalies (statistical or semantic), and ii) strategy of selection of hyperparameters, especially the number of available anomalies in the validation set. Different methods perform the best in different contexts, i.e. combination of experimental conditions together with computational time. This explains the variability of the previous results and highlights the importance of careful specification of the context in the publication of a new method. All our code and results are available for download.
View on arXiv