Anonymization is the process of removing or hiding sensitive information in logs. Anonymization allows organizations to share network logs while not exposing sensitive information. However, there is an inherent trade off between the amount of information revealed in the log and the usefulness of the log to the client (the utility of a log). There are many anonymization techniques, and there are many ways to anonymize a particular log (that is, which fields to anonymize and how). Different anonymization policies will result in logs with varying levels of utility for analysis. In this paper we explore the effect of different anonymization policies on logs. We provide an empirical analysis of the effect of varying anonymization policies by looking at the number of alerts generated by an Intrusion Detection System. This is the first work to thoroughly evaluate the effect of single field anonymization policies on a data set. Our main contributions are to determine a set of fields that have a large impact on the utility of a log.
View on arXiv