Learning the Language of NVMe Streams for Ransomware Detection

7 February 2025

Main:10 Pages

10 Figures

Bibliography:3 Pages

20 Tables

Appendix:12 Pages

Abstract

We apply language modeling techniques to detect ransomware activity in NVMe command sequences. We design and train two types of transformer-based models: the Command-Level Transformer (CLT) performs in-context token classification to determine whether individual commands are initiated by ransomware, and the Patch-Level Transformer (PLT) predicts the volume of data accessed by ransomware within a patch of commands. We present both model designs and the corresponding tokenization and embedding schemes and show that they improve over state-of-the-art tabular methods by up to 24% in missed-detection rate, 66% in data loss prevention, and 84% in identifying data accessed by ransomware.

View on arXiv

Comments on this paper