Towards Inference-time Category-wise Safety Steering for Large Language Models

2 October 2024

Papers citing "Towards Inference-time Category-wise Safety Steering for Large Language Models"

1 / 1 papers shown

Title
Focus On This, Not That! Steering LLMs With Adaptive Feature Specification Tom A. Lamb Adam Davies Alasdair Paren Philip H. S. Torr Francesco Pinto 45 0 0 30 Oct 2024