An Axiomatic Perspective on Anomaly Detection
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
A major challenge for both theoretical treatment and practical application of unsupervised learning tasks, such as clustering, anomaly detection or generative modeling, is the inherent lack of quantifiable objectives. Choosing methods and evaluating outcomes is then often a matter of ad-hoc heuristics or personal taste. Anomaly detection is often employed as a preprocessing step to other learning tasks, and unsound decisions for this task may thus have far-reaching consequences. In this work, we propose an axiomatic framework for analyzing behaviours of anomaly detection methods. We propose a basic set of desirable properties (or axioms) for distance-based anomaly detection methods and identify dependencies and (in-)consistencies between subsets of these. In addition, we include empirical results, which demonstrate the benefits of this axiomatic perspective on behaviours of anomaly detection methods. Our experiments illustrate how some commonly employed algorithms violate, perhaps unexpectedly, a basic desirable property. Namely, we highlight a material problem with a commonly used method called Isolation Forest, related to infinite bands of space likely to be labelled as inliers that extend infinitely far away from the training data. Additionally, we experimentally demonstrate that another common method, Local Outlier Factor, is vulnerable to adversarial data poisoning. To conduct these experimental evaluations, a tool for dataset generation, experimentation and visualization was built, which is an additional contribution of this work.