ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.00137
  4. Cited By
Emerging Vulnerabilities in Frontier Models: Multi-Turn Jailbreak
  Attacks

Emerging Vulnerabilities in Frontier Models: Multi-Turn Jailbreak Attacks

29 August 2024
Tom Gibbs
Ethan Kosak-Hine
George Ingebretsen
Jason Zhang
Julius Broomfield
Sara Pieri
Reihaneh Iranmanesh
Reihaneh Rabbany
Kellin Pelrine
    AAML
ArXivPDFHTML

Papers citing "Emerging Vulnerabilities in Frontier Models: Multi-Turn Jailbreak Attacks"

5 / 5 papers shown
Title
Strategize Globally, Adapt Locally: A Multi-Turn Red Teaming Agent with Dual-Level Learning
S. Chen
Xiao Yu
Ninareh Mehrabi
Rahul Gupta
Zhou Yu
Ruoxi Jia
AAML
LLMAG
53
0
0
02 Apr 2025
Jigsaw Puzzles: Splitting Harmful Questions to Jailbreak Large Language
  Models
Jigsaw Puzzles: Splitting Harmful Questions to Jailbreak Large Language Models
Hao Yang
Lizhen Qu
Ehsan Shareghi
Gholamreza Haffari
AAML
36
1
0
15 Oct 2024
You Know What I'm Saying: Jailbreak Attack via Implicit Reference
You Know What I'm Saying: Jailbreak Attack via Implicit Reference
Tianyu Wu
Lingrui Mei
Ruibin Yuan
Lujun Li
Wei Xue
Yike Guo
48
1
0
04 Oct 2024
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
Nathaniel Li
Ziwen Han
Ian Steneker
Willow Primack
Riley Goodside
Hugh Zhang
Zifan Wang
Cristina Menghini
Summer Yue
AAML
MU
46
40
0
27 Aug 2024
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
1