Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.11801
Cited By
Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations
17 June 2024
Rima Hazra
Sayan Layek
Somnath Banerjee
Soujanya Poria
KELM
LLMSV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations"
5 / 5 papers shown
Title
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge
Yan-Lun Chen
Yi-Ru Wei
Chia-Yi Hsu
Chia-Mu Yu
Chun-ying Huang
Ying-Dar Lin
Yu-Sung Wu
Wei-Bin Lee
MoMe
KELM
48
0
0
27 Feb 2025
Soteria: Language-Specific Functional Parameter Steering for Multilingual Safety Alignment
Somnath Banerjee
Sayan Layek
Pratyush Chatterjee
Animesh Mukherjee
Rima Hazra
LLMSV
71
0
0
16 Feb 2025
Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic
Rishabh Bhardwaj
Do Duc Anh
Soujanya Poria
MoMe
48
35
0
19 Feb 2024
SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
Zhangchen Xu
Fengqing Jiang
Luyao Niu
Jinyuan Jia
Bill Yuchen Lin
Radha Poovendran
AAML
129
82
0
14 Feb 2024
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Yao Lu
Max Bartolo
Alastair Moore
Sebastian Riedel
Pontus Stenetorp
AILaw
LRM
274
1,114
0
18 Apr 2021
1