Advancing Beyond Identification: Multi-bit Watermark for Language Models

North American Chapter of the Association for Computational Linguistics (NAACL), 2023

1 August 2023

Kiyoon Yoo

Wonhyuk Ahn

Nojun Kwak

WaLM

ArXiv (abs)PDF HTML Github (13★)

Abstract

This study aims to proactively tackle misuse of large language models beyond identification of machine-generated text. While existing methods focus on detection, some malicious misuses demand tracing the adversary user for counteracting them. To address this, we propose "Multi-bit Watermark through Color-listing" (COLOR), embedding traceable multi-bit information during language model generation. Leveraging the benefits of zero-bit watermarking (Kirchenbauer et al., 2023a), COLOR enables extraction without model access, on-the-fly embedding, and maintains text quality, while allowing zero-bit detection all at the same time. Preliminary experiments demonstrates successful embedding of 32-bit messages with 91.9% accuracy in moderate-length texts ( $\sim$ 500 tokens). This work advances strategies to counter language model misuse effectively.

View on arXiv

Comments on this paper