ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.09798
  4. Cited By
All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks

All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks

18 January 2024
Kazuhiro Takemoto
ArXivPDFHTML

Papers citing "All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks"

7 / 7 papers shown
Title
In-Context Experience Replay Facilitates Safety Red-Teaming of Text-to-Image Diffusion Models
In-Context Experience Replay Facilitates Safety Red-Teaming of Text-to-Image Diffusion Models
Zhi-Yi Chin
Kuan-Chen Mu
Mario Fritz
Pin-Yu Chen
DiffM
83
0
0
25 Nov 2024
Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring
Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring
Honglin Mu
Han He
Yuxin Zhou
Yunlong Feng
Yang Xu
...
Zeming Liu
Xudong Han
Qi Shi
Qingfu Zhu
Wanxiang Che
AAML
28
1
0
28 Oct 2024
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko
Francesco Croce
Nicolas Flammarion
AAML
79
155
0
02 Apr 2024
Survey of Vulnerabilities in Large Language Models Revealed by
  Adversarial Attacks
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks
Erfan Shayegani
Md Abdullah Al Mamun
Yu Fu
Pedram Zaree
Yue Dong
Nael B. Abu-Ghazaleh
AAML
138
139
0
16 Oct 2023
A Review of ChatGPT Applications in Education, Marketing, Software
  Engineering, and Healthcare: Benefits, Drawbacks, and Research Directions
A Review of ChatGPT Applications in Education, Marketing, Software Engineering, and Healthcare: Benefits, Drawbacks, and Research Directions
Mohammad Fraiwan
Natheer Khasawneh
38
35
0
29 Apr 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
1