663
v1v2v3v4 (latest)

Neural Honeytrace: Plug&Play Watermarking Framework against Model Extraction Attacks

Main:8 Pages
12 Figures
Bibliography:2 Pages
13 Tables
Appendix:7 Pages
Abstract

Triggerable watermarking enables model owners to assert ownership against model extraction attacks. However, most existing approaches require additional training, which limits post-deployment flexibility, and the lack of clear theoretical foundations makes them vulnerable to adaptive attacks. In this paper, we propose Neural Honeytrace, a plug-and-play watermarking framework that operates without retraining. We redefine the watermark transmission mechanism from an information perspective, designing a training-free multi-step transmission strategy that leverages the long-tailed effect of backdoor learning to achieve efficient and robust watermark embedding. Extensive experiments demonstrate that Neural Honeytrace reduces the average number of queries required for a worst-case t-test-based ownership verification to as low as 2%2\% of existing methods, while incurring zero training cost.

View on arXiv
Comments on this paper