ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.11244
66
0

Soteria: Language-Specific Functional Parameter Steering for Multilingual Safety Alignment

16 February 2025
Somnath Banerjee
Sayan Layek
Pratyush Chatterjee
Animesh Mukherjee
Rima Hazra
    LLMSV
ArXivPDFHTML
Abstract

Ensuring consistent safety across multiple languages remains a significant challenge for large language models (LLMs). We introduce Soteria, a lightweight yet powerful strategy that locates and minimally adjusts the "functional heads" most responsible for harmful content generation in each language. By altering only a fraction of parameters, Soteria drastically reduces policy violations without sacrificing overall model performance, even in low-resource settings. To rigorously evaluate our approach, we also present XThreatBench, a specialized multilingual dataset capturing fine-grained harmful behaviors drawn from real policy guidelines. Experiments with leading open-source LLMs (e.g., Llama, Qwen, Mistral) show that Soteria consistently improves safety metrics across high-, mid-, and low-resource languages. These findings highlight a promising path toward scalable, linguistically attuned, and ethically aligned LLMs worldwide.

View on arXiv
@article{banerjee2025_2502.11244,
  title={ Soteria: Language-Specific Functional Parameter Steering for Multilingual Safety Alignment },
  author={ Somnath Banerjee and Sayan Layek and Pratyush Chatterjee and Animesh Mukherjee and Rima Hazra },
  journal={arXiv preprint arXiv:2502.11244},
  year={ 2025 }
}
Comments on this paper