Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2407.03232
Cited By
Single Character Perturbations Break LLM Alignment
3 July 2024
Leon Lin
Hannah Brown
Kenji Kawaguchi
Michael Shieh
AAML
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Single Character Perturbations Break LLM Alignment"
5 / 5 papers shown
Red Teaming Large Reasoning Models
Jiawei Chen
Y. Yang
Chao Yu
Yu Tian
Zhi Cao
Linghao Li
Hang Su
Z. Yin
Zhaoxia Yin
LRM
155
0
0
29 Nov 2025
Unexplored flaws in multiple-choice VQA evaluations
Fabio Rosenthal
Sebastian Schmidt
Thorsten Graf
Thorsten Bagodonat
Stephan Günnemann
Leo Schwinn
67
0
0
27 Nov 2025
When Smiley Turns Hostile: Interpreting How Emojis Trigger LLMs' Toxicity
Shiyao Cui
Xijia Feng
Yingkang Wang
Junxiao Yang
Zhexin Zhang
Biplab Sikdar
Hongning Wang
Han Qiu
Shiyu Huang
149
1
0
14 Sep 2025
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates
AAAI Conference on Artificial Intelligence (AAAI), 2024
Fengqing Jiang
Zhangchen Xu
Luyao Niu
Bill Yuchen Lin
Radha Poovendran
SILM
334
26
0
08 Jan 2025
Certifying LLM Safety against Adversarial Prompting
Aounon Kumar
Chirag Agarwal
Suraj Srinivas
Aaron Jiaxun Li
Soheil Feizi
Himabindu Lakkaraju
AAML
714
273
0
06 Sep 2023
1
Page 1 of 1