Towards LLM Guardrails via Sparse Representation Steering

21 March 2025

Papers citing "Towards LLM Guardrails via Sparse Representation Steering"

1 / 1 papers shown

Title
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control Hannah Cyberey David E. Evans LLMSV 72 0 0 23 Apr 2025