151

Character-Level Perturbations Disrupt LLM Watermarks

Main:12 Pages
8 Figures
Bibliography:2 Pages
17 Tables
Appendix:6 Pages
Abstract

Large Language Model (LLM) watermarking embeds detectable signals into generated text for copyright protection, misuse prevention, and content detection. While prior studies evaluate robustness using watermark removal attacks, these methods are often suboptimal, creating the misconception that effective removal requires large perturbations or powerful adversaries.

View on arXiv
Comments on this paper