Hardening Classifiers against Evasion: the Good, the Bad, and the Ugly

28 August 2017

Liang Tong

Abstract

Machine learning is widely used in security applications, particularly in the form of statistical classification aimed at distinguishing benign from malicious entities. Recent research has shown that such classifiers are often vulnerable to evasion attacks, whereby adversaries change behavior to be categorized as benign while preserving malicious functionality. Research into evasion attacks has followed two paradigms: attacks in problem space, where the actual malicious instance such as the PDF file, is modified, and attacks in feature space, where the evasion attack is abstracted into directly modifying numerical features corresponding to malicious instances rather than instances themselves. However, there exists no prior validation of the effectiveness of feature space threat models in representing real evasion attacks. We make several contributions to address this gap, using PDF malware detection as a case study, with four PDF malware detectors. First, we use iterative retraining to create a baseline for evasion-robust PDF malware detection by using an automated problem space attack generator in the retraining loop. Second, we use this baseline to demonstrate that replacing problem space attacks with feature space attacks may significantly reduce the robustness of the resulting classifier. Third, we demonstrate the existence of conserved (or invariant) features, show how these can be leveraged to design evasion-robust classifiers that are nearly as effective as those relying on the problem space attack, and present an approach for automatically identifying conserved features of PDF malware detectors. Finally, we evaluate generalizability of evasion defense through retraining by considering two additional evasion attacks.

View on arXiv

Comments on this paper