Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.03523
Cited By
A Probabilistic Perspective on Unlearning and Alignment for Large Language Models
4 October 2024
Yan Scholten
Stephan Günnemann
Leo Schwinn
MU
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Probabilistic Perspective on Unlearning and Alignment for Large Language Models"
4 / 4 papers shown
Title
LLM-Safety Evaluations Lack Robustness
Tim Beyer
Sophie Xhonneux
Simon Geisler
Gauthier Gidel
Leo Schwinn
Stephan Günnemann
ALM
ELM
83
0
0
04 Mar 2025
The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence
Tom Wollschlager
Jannes Elstner
Simon Geisler
Vincent Cohen-Addad
Stephan Günnemann
Johannes Gasteiger
LLMSV
47
0
0
24 Feb 2025
REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective
Simon Geisler
Tom Wollschlager
M. H. I. Abdalla
Vincent Cohen-Addad
Johannes Gasteiger
Stephan Günnemann
AAML
68
2
0
24 Feb 2025
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Leo Schwinn
David Dobre
Sophie Xhonneux
Gauthier Gidel
Stephan Gunnemann
AAML
28
36
0
14 Feb 2024
1