Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.00137
Cited By
Emerging Vulnerabilities in Frontier Models: Multi-Turn Jailbreak Attacks
29 August 2024
Tom Gibbs
Ethan Kosak-Hine
George Ingebretsen
Jason Zhang
Julius Broomfield
Sara Pieri
Reihaneh Iranmanesh
Reihaneh Rabbany
Kellin Pelrine
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Emerging Vulnerabilities in Frontier Models: Multi-Turn Jailbreak Attacks"
5 / 5 papers shown
Title
Strategize Globally, Adapt Locally: A Multi-Turn Red Teaming Agent with Dual-Level Learning
S. Chen
Xiao Yu
Ninareh Mehrabi
Rahul Gupta
Zhou Yu
Ruoxi Jia
AAML
LLMAG
53
0
0
02 Apr 2025
Jigsaw Puzzles: Splitting Harmful Questions to Jailbreak Large Language Models
Hao Yang
Lizhen Qu
Ehsan Shareghi
Gholamreza Haffari
AAML
36
1
0
15 Oct 2024
You Know What I'm Saying: Jailbreak Attack via Implicit Reference
Tianyu Wu
Lingrui Mei
Ruibin Yuan
Lujun Li
Wei Xue
Yike Guo
48
1
0
04 Oct 2024
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
Nathaniel Li
Ziwen Han
Ian Steneker
Willow Primack
Riley Goodside
Hugh Zhang
Zifan Wang
Cristina Menghini
Summer Yue
AAML
MU
46
40
0
27 Aug 2024
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
1