ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates

ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates

8 January 2025

Bill Yuchen Lin

Radha Poovendran

Papers citing "ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates"

8 / 8 papers shown

Title
Teaching Models to Understand (but not Generate) High-risk Data Ryan Yixiang Wang Matthew Finlayson Luca Soldaini Swabha Swayamdipta Robin Jia 34 0 0 05 May 2025
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots Erfan Shayegani G M Shahariar Sara Abdali Lei Yu Nael B. Abu-Ghazaleh Yue Dong AAML 48 0 0 01 Apr 2025
SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding Zhangchen Xu Fengqing Jiang Luyao Niu Jinyuan Jia Bill Yuchen Lin Radha Poovendran AAML 129 82 0 14 Feb 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts Jiahao Yu Xingwei Lin Zheng Yu Xinyu Xing SILM 110 292 0 19 Sep 2023
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 301 11,730 0 04 Mar 2022
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity Yao Lu Max Bartolo Alastair Moore Sebastian Riedel Pontus Stenetorp AILaw LRM 274 1,114 0 18 Apr 2021
Adversarial Machine Learning at Scale Alexey Kurakin Ian Goodfellow Samy Bengio AAML 253 3,102 0 04 Nov 2016
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Z. Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 716 6,724 0 26 Sep 2016