210

Semantic Representation Attack against Aligned Large Language Models

Main:10 Pages
3 Figures
Bibliography:5 Pages
15 Tables
Appendix:22 Pages
Abstract

Large Language Models (LLMs) increasingly employ alignment techniques to prevent harmful outputs. Despite these safeguards, attackers can circumvent them by crafting prompts that induce LLMs to generate harmful content.

View on arXiv
Comments on this paper