ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.11746
  4. Cited By
Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned
  Language Models through Task Arithmetic

Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic

19 February 2024
Rishabh Bhardwaj
Do Duc Anh
Soujanya Poria
    MoMe
ArXivPDFHTML

Papers citing "Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic"

7 / 7 papers shown
Title
AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization
AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization
H. Luo
Haiying He
Y. Wang
Jinluan Yang
Rui Liu
Naiqiang Tan
Xiaochun Cao
Dacheng Tao
Li Shen
LRM
21
0
0
30 Apr 2025
Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors
Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors
Ren-Wei Liang
Chin-Ting Hsu
Chan-Hung Yu
Saransh Agrawal
Shih-Cheng Huang
Shang-Tse Chen
Kuan-Hao Huang
Shao-Hua Sun
71
0
0
27 Apr 2025
Multi-Task Model Merging via Adaptive Weight Disentanglement
Multi-Task Model Merging via Adaptive Weight Disentanglement
Feng Xiong
Runxi Cheng
Wang Chen
Zhanqiu Zhang
Yiwen Guo
Chun Yuan
Ruifeng Xu
MoMe
80
4
0
10 Jan 2025
JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit
Zeqing He
Zhibo Wang
Zhixuan Chu
Huiyu Xu
Rui Zheng
Kui Ren
Chun Chen
36
3
0
17 Nov 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
Paul Röttger
Fabio Pernisi
Bertie Vidgen
Dirk Hovy
ELM
KELM
49
30
0
08 Apr 2024
Language Model Unalignment: Parametric Red-Teaming to Expose Hidden
  Harms and Biases
Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases
Rishabh Bhardwaj
Soujanya Poria
ALM
31
14
0
22 Oct 2023
The Power of Scale for Parameter-Efficient Prompt Tuning
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
275
3,784
0
18 Apr 2021
1