Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.11746
Cited By
Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic
19 February 2024
Rishabh Bhardwaj
Do Duc Anh
Soujanya Poria
MoMe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic"
7 / 7 papers shown
Title
AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization
H. Luo
Haiying He
Y. Wang
Jinluan Yang
Rui Liu
Naiqiang Tan
Xiaochun Cao
Dacheng Tao
Li Shen
LRM
21
0
0
30 Apr 2025
Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors
Ren-Wei Liang
Chin-Ting Hsu
Chan-Hung Yu
Saransh Agrawal
Shih-Cheng Huang
Shang-Tse Chen
Kuan-Hao Huang
Shao-Hua Sun
71
0
0
27 Apr 2025
Multi-Task Model Merging via Adaptive Weight Disentanglement
Feng Xiong
Runxi Cheng
Wang Chen
Zhanqiu Zhang
Yiwen Guo
Chun Yuan
Ruifeng Xu
MoMe
80
4
0
10 Jan 2025
JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit
Zeqing He
Zhibo Wang
Zhixuan Chu
Huiyu Xu
Rui Zheng
Kui Ren
Chun Chen
36
3
0
17 Nov 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
Paul Röttger
Fabio Pernisi
Bertie Vidgen
Dirk Hovy
ELM
KELM
49
30
0
08 Apr 2024
Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases
Rishabh Bhardwaj
Soujanya Poria
ALM
34
14
0
22 Oct 2023
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
275
3,784
0
18 Apr 2021
1