Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.16006
Cited By
ASETF: A Novel Method for Jailbreak Attack on LLMs through Translate Suffix Embeddings
25 February 2024
Hao Wang
Hao Li
Minlie Huang
Lei Sha
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ASETF: A Novel Method for Jailbreak Attack on LLMs through Translate Suffix Embeddings"
15 / 15 papers shown
Title
PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization
Yang Jiao
X. Wang
Kai Yang
AAML
SILM
31
0
0
10 Apr 2025
Sugar-Coated Poison: Benign Generation Unlocks LLM Jailbreaking
Yu-Hang Wu
Yu-Jie Xiong
Jie-Zhang
AAML
25
0
0
08 Apr 2025
Adversarial Tokenization
Renato Lui Geh
Zilei Shao
Guy Van den Broeck
SILM
AAML
87
0
0
04 Mar 2025
GuidedBench: Equipping Jailbreak Evaluation with Guidelines
Ruixuan Huang
Xunguang Wang
Zongjie Li
Daoyuan Wu
Shuai Wang
ALM
ELM
55
0
0
24 Feb 2025
KDA: A Knowledge-Distilled Attacker for Generating Diverse Prompts to Jailbreak LLMs
Buyun Liang
Kwan Ho Ryan Chan
D. Thaker
Jinqi Luo
René Vidal
AAML
41
0
0
05 Feb 2025
DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak
Hao Wang
Hao Li
Junda Zhu
Xinyuan Wang
C. Pan
Minlie Huang
Lei Sha
95
0
0
23 Dec 2024
Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context
Nilanjana Das
Edward Raff
Manas Gaur
AAML
101
1
0
20 Dec 2024
Fast Convergence of
Φ
Φ
Φ
-Divergence Along the Unadjusted Langevin Algorithm and Proximal Sampler
Siddharth Mitra
Andre Wibisono
47
0
0
14 Oct 2024
Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations
Tarun Raheja
Nilay Pochhi
AAML
46
1
0
09 Oct 2024
Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context
Nilanjana Das
Edward Raff
Manas Gaur
AAML
35
2
0
19 Jul 2024
Harnessing the Plug-and-Play Controller by Prompting
Hao Wang
Lei Sha
19
3
0
06 Feb 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
113
300
0
19 Sep 2023
Diffusion-LM Improves Controllable Text Generation
Xiang Lisa Li
John Thickstun
Ishaan Gulrajani
Percy Liang
Tatsunori B. Hashimoto
AI4CE
171
772
0
27 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
Unsolved Problems in ML Safety
Dan Hendrycks
Nicholas Carlini
John Schulman
Jacob Steinhardt
173
272
0
28 Sep 2021
1