Probing the Robustness of Large Language Models Safety to Latent Perturbations

Probing the Robustness of Large Language Models Safety to Latent Perturbations

19 June 2025

ArXiv (abs)PDF HTML

Papers citing "Probing the Robustness of Large Language Models Safety to Latent Perturbations"

1 / 1 papers shown

Title
The Rogue Scalpel: Activation Steering Compromises LLM Safety Anton Korznikov Andrey V. Galichin Alexey Dontsov Oleg Y. Rogov Ivan Oseledets Elena Tutubalina LLMSV AAML 12 0 0 26 Sep 2025