ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.03232
  4. Cited By
Single Character Perturbations Break LLM Alignment

Single Character Perturbations Break LLM Alignment

3 July 2024
Leon Lin
Hannah Brown
Kenji Kawaguchi
Michael Shieh
    AAML
ArXiv (abs)PDFHTML

Papers citing "Single Character Perturbations Break LLM Alignment"

5 / 5 papers shown
Red Teaming Large Reasoning Models
Red Teaming Large Reasoning Models
Jiawei Chen
Y. Yang
Chao Yu
Yu Tian
Zhi Cao
Linghao Li
Hang Su
Z. Yin
Zhaoxia Yin
LRM
155
0
0
29 Nov 2025
Unexplored flaws in multiple-choice VQA evaluations
Unexplored flaws in multiple-choice VQA evaluations
Fabio Rosenthal
Sebastian Schmidt
Thorsten Graf
Thorsten Bagodonat
Stephan Günnemann
Leo Schwinn
67
0
0
27 Nov 2025
When Smiley Turns Hostile: Interpreting How Emojis Trigger LLMs' Toxicity
When Smiley Turns Hostile: Interpreting How Emojis Trigger LLMs' Toxicity
Shiyao Cui
Xijia Feng
Yingkang Wang
Junxiao Yang
Zhexin Zhang
Biplab Sikdar
Hongning Wang
Han Qiu
Shiyu Huang
149
1
0
14 Sep 2025
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat TemplatesAAAI Conference on Artificial Intelligence (AAAI), 2024
Fengqing Jiang
Zhangchen Xu
Luyao Niu
Bill Yuchen Lin
Radha Poovendran
SILM
334
26
0
08 Jan 2025
Certifying LLM Safety against Adversarial Prompting
Certifying LLM Safety against Adversarial Prompting
Aounon Kumar
Chirag Agarwal
Suraj Srinivas
Aaron Jiaxun Li
Soheil Feizi
Himabindu Lakkaraju
AAML
714
273
0
06 Sep 2023
1
Page 1 of 1