Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.17254
Cited By
REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective
24 February 2025
Simon Geisler
Tom Wollschlager
M. H. I. Abdalla
Vincent Cohen-Addad
Johannes Gasteiger
Stephan Günnemann
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective"
3 / 3 papers shown
Title
The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence
Tom Wollschlager
Jannes Elstner
Simon Geisler
Vincent Cohen-Addad
Stephan Günnemann
Johannes Gasteiger
LLMSV
47
0
0
24 Feb 2025
A Probabilistic Perspective on Unlearning and Alignment for Large Language Models
Yan Scholten
Stephan Günnemann
Leo Schwinn
MU
46
6
0
04 Oct 2024
Attacking Large Language Models with Projected Gradient Descent
Simon Geisler
Tom Wollschlager
M. H. I. Abdalla
Johannes Gasteiger
Stephan Günnemann
AAML
SILM
39
48
0
14 Feb 2024
1