Papers
Communities
Organizations
Events
Blog
Pricing
Feedback
Contact Sales
Search
Open menu
Home
Papers
2505.19056
Cited By
An Embarrassingly Simple Defense Against LLM Abliteration Attacks
25 May 2025
Harethah Shairah
Hasan Hammoud
Bernard Ghanem
G. Turkiyyah
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (5 upvotes)
Papers citing
"An Embarrassingly Simple Defense Against LLM Abliteration Attacks"
2 / 2 papers shown
Title
Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection
Harethah Shairah
Hasan Hammoud
G. Turkiyyah
Bernard Ghanem
LLMSV
36
0
0
28 Aug 2025
Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs
Sai Krishna Mendu
Harish Yenala
Aditi Gulati
Shanu Kumar
Parag Agrawal
191
4
0
04 May 2025
1