Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2407.03232
Cited By

Single Character Perturbations Break LLM Alignment

Single Character Perturbations Break LLM Alignment

3 July 2024

Kenji Kawaguchi

ArXiv (abs)PDF HTML

Papers citing "Single Character Perturbations Break LLM Alignment"

5 / 5 papers shown

Red Teaming Large Reasoning Models

Red Teaming Large Reasoning Models

Zhaoxia Yin

155

0

0

29 Nov 2025

Unexplored flaws in multiple-choice VQA evaluations

Unexplored flaws in multiple-choice VQA evaluations

Fabio Rosenthal

Sebastian Schmidt

Thorsten Bagodonat

Stephan Günnemann

67

0

0

27 Nov 2025

When Smiley Turns Hostile: Interpreting How Emojis Trigger LLMs' Toxicity

When Smiley Turns Hostile: Interpreting How Emojis Trigger LLMs' Toxicity

149

1

0

14 Sep 2025

ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates

ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat TemplatesAAAI Conference on Artificial Intelligence (AAAI), 2024

Bill Yuchen Lin

Radha Poovendran

334

26

0

08 Jan 2025

Certifying LLM Safety against Adversarial Prompting

Certifying LLM Safety against Adversarial Prompting

Aaron Jiaxun Li

Himabindu Lakkaraju

714

273

0

06 Sep 2023

Page 1 of 1