ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.06255
  4. Cited By
Fight Back Against Jailbreaking via Prompt Adversarial Tuning

Fight Back Against Jailbreaking via Prompt Adversarial Tuning

9 February 2024
Yichuan Mo
Yuji Wang
Zeming Wei
Yisen Wang
    AAML
    SILM
ArXivPDFHTML

Papers citing "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"

5 / 5 papers shown
Title
Foot-In-The-Door: A Multi-turn Jailbreak for LLMs
Foot-In-The-Door: A Multi-turn Jailbreak for LLMs
Zixuan Weng
Xiaolong Jin
Jinyuan Jia
X. Zhang
AAML
38
0
0
27 Feb 2025
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
Xunguang Wang
Daoyuan Wu
Zhenlan Ji
Zongjie Li
Pingchuan Ma
Shuai Wang
Yingjiu Li
Yang Liu
Ning Liu
Juergen Rahmel
AAML
51
6
0
08 Jun 2024
Towards Building a Robust Toxicity Predictor
Towards Building a Robust Toxicity Predictor
Dmitriy Bespalov
Sourav S. Bhabesh
Yi Xiang
Liutong Zhou
Yanjun Qi
AAML
90
10
0
09 Apr 2024
Survey of Vulnerabilities in Large Language Models Revealed by
  Adversarial Attacks
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks
Erfan Shayegani
Md Abdullah Al Mamun
Yu Fu
Pedram Zaree
Yue Dong
Nael B. Abu-Ghazaleh
AAML
135
139
0
16 Oct 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
1