ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
  • Feedback
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.19056
  4. Cited By
An Embarrassingly Simple Defense Against LLM Abliteration Attacks

An Embarrassingly Simple Defense Against LLM Abliteration Attacks

25 May 2025
Harethah Shairah
Hasan Hammoud
Bernard Ghanem
G. Turkiyyah
ArXiv (abs)PDFHTMLHuggingFace (5 upvotes)

Papers citing "An Embarrassingly Simple Defense Against LLM Abliteration Attacks"

2 / 2 papers shown
Title
Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection
Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection
Harethah Shairah
Hasan Hammoud
G. Turkiyyah
Bernard Ghanem
LLMSV
36
0
0
28 Aug 2025
Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs
Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs
Sai Krishna Mendu
Harish Yenala
Aditi Gulati
Shanu Kumar
Parag Agrawal
191
4
0
04 May 2025
1