Injecting Universal Jailbreak Backdoors into LLMs in MinutesInternational Conference on Learning Representations (ICLR), 2025 |
Safety Alignment Should Be Made More Than Just a Few Tokens DeepInternational Conference on Learning Representations (ICLR), 2024 |
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive AttacksInternational Conference on Learning Representations (ICLR), 2024 |
Shortcuts Everywhere and Nowhere: Exploring Multi-Trigger Backdoor AttacksIEEE Transactions on Dependable and Secure Computing (IEEE TDSC), 2024 |
BadChain: Backdoor Chain-of-Thought Prompting for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024 |
Universal Jailbreak Backdoors from Poisoned Human FeedbackInternational Conference on Learning Representations (ICLR), 2023 |
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt
InjectionNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023 |
BackdoorBench: A Comprehensive Benchmark of Backdoor LearningNeural Information Processing Systems (NeurIPS), 2022 |
Language Models are Few-Shot LearnersNeural Information Processing Systems (NeurIPS), 2020 |
Weight Poisoning Attacks on Pre-trained ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2020 |