Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.13517
Cited By
Round Trip Translation Defence against Large Language Model Jailbreaking Attacks
21 February 2024
Canaan Yung
H. M. Dolatabadi
S. Erfani
Christopher Leckie
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Round Trip Translation Defence against Large Language Model Jailbreaking Attacks"
4 / 4 papers shown
Title
Single-pass Detection of Jailbreaking Input in Large Language Models
Leyla Naz Candogan
Yongtao Wu
Elias Abad Rocamora
Grigorios G. Chrysos
V. Cevher
AAML
45
0
0
24 Feb 2025
AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment
Pankayaraj Pathmanathan
Udari Madhushani Sehwag
Michael-Andrei Panaitescu-Liess
Furong Huang
SILM
AAML
35
0
0
15 Oct 2024
Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations
Tarun Raheja
Nilay Pochhi
AAML
46
1
0
09 Oct 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
47
6
0
20 Jul 2024
1