ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.12935
  4. Cited By
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates

ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates

8 January 2025
Fengqing Jiang
Zhangchen Xu
Luyao Niu
Bill Yuchen Lin
Radha Poovendran
    SILM
ArXivPDFHTML

Papers citing "ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates"

8 / 8 papers shown
Title
Teaching Models to Understand (but not Generate) High-risk Data
Teaching Models to Understand (but not Generate) High-risk Data
Ryan Yixiang Wang
Matthew Finlayson
Luca Soldaini
Swabha Swayamdipta
Robin Jia
34
0
0
05 May 2025
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots
Erfan Shayegani
G M Shahariar
Sara Abdali
Lei Yu
Nael B. Abu-Ghazaleh
Yue Dong
AAML
48
0
0
01 Apr 2025
SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware
  Decoding
SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
Zhangchen Xu
Fengqing Jiang
Luyao Niu
Jinyuan Jia
Bill Yuchen Lin
Radha Poovendran
AAML
129
82
0
14 Feb 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated
  Jailbreak Prompts
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
110
292
0
19 Sep 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Fantastically Ordered Prompts and Where to Find Them: Overcoming
  Few-Shot Prompt Order Sensitivity
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Yao Lu
Max Bartolo
Alastair Moore
Sebastian Riedel
Pontus Stenetorp
AILaw
LRM
274
1,114
0
18 Apr 2021
Adversarial Machine Learning at Scale
Adversarial Machine Learning at Scale
Alexey Kurakin
Ian Goodfellow
Samy Bengio
AAML
253
3,102
0
04 Nov 2016
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,724
0
26 Sep 2016
1