ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.14469
87
0

Attributional Safety Failures in Large Language Models under Code-Mixed Perturbations

20 May 2025
Somnath Banerjee
Pratyush Chatterjee
Shanu Kumar
Sayan Layek
Parag Agrawal
Rima Hazra
Animesh Mukherjee
    AAML
ArXivPDFHTML
Abstract

Recent advancements in LLMs have raised significant safety concerns, particularly when dealing with code-mixed inputs and outputs. Our study systematically investigates the increased susceptibility of LLMs to produce unsafe outputs from code-mixed prompts compared to monolingual English prompts. Utilizing explainability methods, we dissect the internal attribution shifts causing model's harmful behaviors. In addition, we explore cultural dimensions by distinguishing between universally unsafe and culturally-specific unsafe queries. This paper presents novel experimental insights, clarifying the mechanisms driving this phenomenon.

View on arXiv
@article{banerjee2025_2505.14469,
  title={ Attributional Safety Failures in Large Language Models under Code-Mixed Perturbations },
  author={ Somnath Banerjee and Pratyush Chatterjee and Shanu Kumar and Sayan Layek and Parag Agrawal and Rima Hazra and Animesh Mukherjee },
  journal={arXiv preprint arXiv:2505.14469},
  year={ 2025 }
}
Comments on this paper