One Model Transfer to All: On Robust Jailbreak Prompts Generation against LLMsInternational Conference on Learning Representations (ICLR), 2025 |
DETAM: Defending LLMs Against Jailbreak Attacks via Targeted Attention ModificationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
You Can't Eat Your Cake and Have It Too: The Performance Degradation of LLMs with Jailbreak DefenseThe Web Conference (WWW), 2025 |
SaLoRA: Safety-Alignment Preserved Low-Rank AdaptationInternational Conference on Learning Representations (ICLR), 2025 |
Recent advancements in LLM Red-Teaming: Techniques, Defenses, and
Ethical Considerations Tarun Raheja Nilay Pochhi |
Mission Impossible: A Statistical Perspective on Jailbreaking LLMsNeural Information Processing Systems (NeurIPS), 2024 |
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner Xunguang Wang Daoyuan Wu Zhenlan Ji Zongjie Li Pingchuan Ma Shuai Wang Yingjiu Li Yang Liu Ning Liu Juergen Rahmel |