Peering Behind the Shield: Guardrail Identification in Large Language Models

3 February 2025

Papers citing "Peering Behind the Shield: Guardrail Identification in Large Language Models"

1 / 1 papers shown

Title
Unified Attacks to Large Language Model Watermarks: Spoofing and Scrubbing in Unauthorized Knowledge Distillation Xin Yi Shunfan Zhengc Linlin Wanga Xiaoling Wang Liang He Liang He AAML 68 0 0 24 Apr 2025