999
v1v2 (latest)

LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint

Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Main:9 Pages
6 Figures
Bibliography:4 Pages
8 Tables
Appendix:6 Pages
Abstract

Fine-tuning pre-trained Large Language Models (LLMs) for specialized tasks incurs substantial computational and data costs. While model merging offers a training-free solution to integrate multiple task-specific models, existing methods suffer from safety-utility conflicts where enhanced general capabilities degrade safety safeguards. We identify two root causes: neuron misidentification\textbf{neuron misidentification} due to simplistic parameter magnitude-based selection, and cross-task neuron interference\textbf{cross-task neuron interference} during merging. To address these challenges, we propose LED-Merging\textbf{LED-Merging}, a three-stage framework that L\textbf{L}ocates task-specific neurons via gradient-based attribution, dynamically E\textbf{E}lects critical neurons through multi-model importance fusion, and D\textbf{D}isjoints conflicting updates through parameter isolation. Extensive experiments on Llama-3-8B, Mistral-7B, and Llama2-13B demonstrate that LED-Merging effectively reduces harmful response rates, showing a 31.4\% decrease on Llama-3-8B-Instruct on HarmBench, while simultaneously preserving 95\% of utility performance, such as achieving 52.39\% accuracy on GSM8K. LED-Merging resolves safety-utility conflicts and provides a lightweight, training-free paradigm for constructing reliable multi-task LLMs. Code is available at \href\href{this https URL}{GitHub}.

View on arXiv
Comments on this paper