ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.09177
  4. Cited By
Leveraging the Context through Multi-Round Interactions for Jailbreaking
  Attacks

Leveraging the Context through Multi-Round Interactions for Jailbreaking Attacks

14 February 2024
Yixin Cheng
Markos Georgopoulos
V. Cevher
Grigorios G. Chrysos
    AAML
ArXivPDFHTML

Papers citing "Leveraging the Context through Multi-Round Interactions for Jailbreaking Attacks"

15 / 15 papers shown
Title
REVEAL: Multi-turn Evaluation of Image-Input Harms for Vision LLM
REVEAL: Multi-turn Evaluation of Image-Input Harms for Vision LLM
Madhur Jindal
Saurabh Deshpande
AAML
43
0
0
07 May 2025
Bayesian Optimization of Robustness Measures Using Randomized GP-UCB-based Algorithms under Input Uncertainty
Bayesian Optimization of Robustness Measures Using Randomized GP-UCB-based Algorithms under Input Uncertainty
Yu Inatsu
36
0
0
04 Apr 2025
On the Robustness of Transformers against Context Hijacking for Linear Classification
On the Robustness of Transformers against Context Hijacking for Linear Classification
Tianle Li
Chenyang Zhang
Xingwu Chen
Yuan Cao
Difan Zou
67
0
0
24 Feb 2025
Preventing Jailbreak Prompts as Malicious Tools for Cybercriminals: A
  Cyber Defense Perspective
Preventing Jailbreak Prompts as Malicious Tools for Cybercriminals: A Cyber Defense Perspective
Jean Marie Tshimula
Xavier Ndona
D'Jeff K. Nkashama
Pierre Martin Tardif
F. Kabanza
Marc Frappier
Shengrui Wang
SILM
81
0
0
25 Nov 2024
Diversity Helps Jailbreak Large Language Models
Diversity Helps Jailbreak Large Language Models
Weiliang Zhao
Daniel Ben-Levi
Wei Hao
Junfeng Yang
Chengzhi Mao
AAML
62
0
0
06 Nov 2024
Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents
Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents
Priyanshu Kumar
Elaine Lau
Saranya Vijayakumar
Tu Trinh
Scale Red Team
...
Sean Hendryx
Shuyan Zhou
Matt Fredrikson
Summer Yue
Zifan Wang
LLMAG
27
17
0
11 Oct 2024
You Know What I'm Saying: Jailbreak Attack via Implicit Reference
You Know What I'm Saying: Jailbreak Attack via Implicit Reference
Tianyu Wu
Lingrui Mei
Ruibin Yuan
Lujun Li
Wei Xue
Yike Guo
35
1
0
04 Oct 2024
AutoJailbreak: Exploring Jailbreak Attacks and Defenses through a
  Dependency Lens
AutoJailbreak: Exploring Jailbreak Attacks and Defenses through a Dependency Lens
Lin Lu
Hai Yan
Zenghui Yuan
Jiawen Shi
Wenqi Wei
Pin-Yu Chen
Pan Zhou
AAML
44
8
0
06 Jun 2024
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko
Francesco Croce
Nicolas Flammarion
AAML
81
155
0
02 Apr 2024
Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack
Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack
M. Russinovich
Ahmed Salem
Ronen Eldan
28
75
0
02 Apr 2024
How (un)ethical are instruction-centric responses of LLMs? Unveiling the
  vulnerabilities of safety guardrails to harmful queries
How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries
Somnath Banerjee
Sayan Layek
Rima Hazra
Animesh Mukherjee
24
10
0
23 Feb 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated
  Jailbreak Prompts
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
110
292
0
19 Sep 2023
Assessing the impact of contextual information in hate speech detection
Assessing the impact of contextual information in hate speech detection
Juan Manuel Pérez
Franco Luque
Demián Zayat
Martín Kondratzky
Agustín Moro
...
Joaquín Zajac
Paula Miguel
Natalia Debandi
Agustin Gravano
Viviana Cotik
24
29
0
02 Oct 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
218
441
0
23 Aug 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
1