Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.14725
Cited By
Testing the Limits of Jailbreaking Defenses with the Purple Problem
20 March 2024
Taeyoun Kim
Suhas Kotha
Aditi Raghunathan
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Testing the Limits of Jailbreaking Defenses with the Purple Problem"
7 / 7 papers shown
Title
FLAME: Flexible LLM-Assisted Moderation Engine
Ivan Bakulin
Ilia Kopanichuk
Iaroslav Bespalov
Nikita Radchenko
V. Shaposhnikov
Dmitry V. Dylov
Ivan Oseledets
74
0
0
13 Feb 2025
Endless Jailbreaks with Bijection Learning
Brian R. Y. Huang
Maximilian Li
Leonard Tang
AAML
51
5
0
02 Oct 2024
Attacking Large Language Models with Projected Gradient Descent
Simon Geisler
Tom Wollschlager
M. H. I. Abdalla
Johannes Gasteiger
Stephan Günnemann
AAML
SILM
34
48
0
14 Feb 2024
On the Risk of Misinformation Pollution with Large Language Models
Yikang Pan
Liangming Pan
Wenhu Chen
Preslav Nakov
Min-Yen Kan
W. Wang
DeLMO
188
105
0
23 May 2023
Critical Perspectives: A Benchmark Revealing Pitfalls in PerspectiveAPI
Lorena Piedras
Lucas Rosenblatt
Julia Wilkins
21
7
0
05 Jan 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Gradient-based Adversarial Attacks against Text Transformers
Chuan Guo
Alexandre Sablayrolles
Hervé Jégou
Douwe Kiela
SILM
93
162
0
15 Apr 2021
1