ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.18608
43
0

Toward Breaking Watermarks in Distortion-free Large Language Models

25 February 2025
Shayleen Reynolds
Saheed O. Obitayo
Niccolò Dalmasso
Dung Daniel Ngo
Vamsi K. Potluru
Manuela Veloso
    AAML
ArXivPDFHTML
Abstract

In recent years, LLM watermarking has emerged as an attractive safeguard against AI-generated content, with promising applications in many real-world domains. However, there are growing concerns that the current LLM watermarking schemes are vulnerable to expert adversaries wishing to reverse-engineer the watermarking mechanisms. Prior work in "breaking" or "stealing" LLM watermarks mainly focuses on the distribution-modifying algorithm of Kirchenbauer et al. (2023), which perturbs the logit vector before sampling. In this work, we focus on reverse-engineering the other prominent LLM watermarking scheme, distortion-free watermarking (Kuditipudi et al. 2024), which preserves the underlying token distribution by using a hidden watermarking key sequence. We demonstrate that, even under a more sophisticated watermarking scheme, it is possible to "compromise" the LLM and carry out a "spoofing" attack. Specifically, we propose a mixed integer linear programming framework that accurately estimates the secret key used for watermarking using only a few samples of the watermarked dataset. Our initial findings challenge the current theoretical claims on the robustness and usability of existing LLM watermarking techniques.

View on arXiv
@article{reynolds2025_2502.18608,
  title={ Toward Breaking Watermarks in Distortion-free Large Language Models },
  author={ Shayleen Reynolds and Saheed Obitayo and Niccolò Dalmasso and Dung Daniel T. Ngo and Vamsi K. Potluru and Manuela Veloso },
  journal={arXiv preprint arXiv:2502.18608},
  year={ 2025 }
}
Comments on this paper