PacketCLIP: Multi-Modal Embedding of Network Traffic and Language for Cybersecurity Reasoning
Traffic classification is vital for cybersecurity, yet encrypted traffic poses significant challenges. We present PacketCLIP, a multi-modal framework combining packet data with natural language semantics through contrastive pretraining and hierarchical Graph Neural Network (GNN) reasoning. PacketCLIP integrates semantic reasoning with efficient classification, enabling robust detection of anomalies in encrypted network flows. By aligning textual descriptions with packet behaviors, it offers enhanced interpretability, scalability, and practical applicability across diverse security scenarios. PacketCLIP achieves a 95% mean AUC, outperforms baselines by 11.6%, and reduces model size by 92%, making it ideal for real-time anomaly detection. By bridging advanced machine learning techniques and practical cybersecurity needs, PacketCLIP provides a foundation for scalable, efficient, and interpretable solutions to tackle encrypted traffic classification and network intrusion detection challenges in resource-constrained environments.
View on arXiv@article{masukawa2025_2503.03747, title={ PacketCLIP: Multi-Modal Embedding of Network Traffic and Language for Cybersecurity Reasoning }, author={ Ryozo Masukawa and Sanggeon Yun and Sungheon Jeong and Wenjun Huang and Yang Ni and Ian Bryant and Nathaniel D. Bastian and Mohsen Imani }, journal={arXiv preprint arXiv:2503.03747}, year={ 2025 } }