Character-Level Perturbations Disrupt LLM Watermarks
- AAMLWaLM

Main:12 Pages
8 Figures
Bibliography:2 Pages
17 Tables
Appendix:6 Pages
Abstract
Large Language Model (LLM) watermarking embeds detectable signals into generated text for copyright protection, misuse prevention, and content detection. While prior studies evaluate robustness using watermark removal attacks, these methods are often suboptimal, creating the misconception that effective removal requires large perturbations or powerful adversaries.
View on arXivComments on this paper
