ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.07932
  4. Cited By
PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition

PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition

13 May 2024
Ziyang Zhang
Qizhen Zhang
Jakob N. Foerster
    AAML
ArXivPDFHTML

Papers citing "PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition"

8 / 8 papers shown
Title
LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures
LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures
Francisco Aguilera-Martínez
Fernando Berzal
PILM
50
0
0
02 May 2025
Prompt Flow Integrity to Prevent Privilege Escalation in LLM Agents
Prompt Flow Integrity to Prevent Privilege Escalation in LLM Agents
Juhee Kim
Woohyuk Choi
Byoungyoung Lee
LLMAG
79
1
0
17 Mar 2025
Foot-In-The-Door: A Multi-turn Jailbreak for LLMs
Foot-In-The-Door: A Multi-turn Jailbreak for LLMs
Zixuan Weng
Xiaolong Jin
Jinyuan Jia
X. Zhang
AAML
81
0
0
27 Feb 2025
LOB-Bench: Benchmarking Generative AI for Finance - an Application to Limit Order Book Data
LOB-Bench: Benchmarking Generative AI for Finance - an Application to Limit Order Book Data
Peer Nagy
Sascha Frey
Kang Li
Bidipta Sarkar
Svitlana Vyetrenko
Stefan Zohren
Ani Calinescu
Jakob Foerster
81
1
0
13 Feb 2025
Smoothed Embeddings for Robust Language Models
Smoothed Embeddings for Robust Language Models
Ryo Hase
Md. Rafi Ur Rashid
Ashley Lewis
Jing Liu
T. Koike-Akino
K. Parsons
Y. Wang
AAML
44
0
0
27 Jan 2025
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Mikayel Samvelyan
Sharath Chandra Raparthy
Andrei Lupu
Eric Hambro
Aram H. Markosyan
...
Minqi Jiang
Jack Parker-Holder
Jakob Foerster
Tim Rocktaschel
Roberta Raileanu
SyDa
68
62
0
26 Feb 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated
  Jailbreak Prompts
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
113
300
0
19 Sep 2023
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
218
441
0
23 Aug 2022
1