The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence

24 February 2025

Papers citing "The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence"

1 / 1 papers shown

Title
A Probabilistic Perspective on Unlearning and Alignment for Large Language Models Yan Scholten Stephan Günnemann Leo Schwinn MU 38 6 0 04 Oct 2024