ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.21018
  4. Cited By
Improved Techniques for Optimization-Based Jailbreaking on Large
  Language Models

Improved Techniques for Optimization-Based Jailbreaking on Large Language Models

31 May 2024
Xiaojun Jia
Tianyu Pang
Chao Du
Yihao Huang
Jindong Gu
Yang Liu
Xiaochun Cao
Min-Bin Lin
    AAML
ArXivPDFHTML

Papers citing "Improved Techniques for Optimization-Based Jailbreaking on Large Language Models"

14 / 14 papers shown
Title
Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs
Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs
Haoming Yang
Ke Ma
X. Jia
Yingfei Sun
Qianqian Xu
Q. Huang
AAML
45
0
0
03 May 2025
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Thomas Winninger
Boussad Addad
Katarzyna Kapusta
AAML
61
0
0
08 Mar 2025
MAD-MAX: Modular And Diverse Malicious Attack MiXtures for Automated LLM Red Teaming
Stefan Schoepf
Muhammad Zaid Hameed
Ambrish Rawat
Kieran Fraser
Giulio Zizzo
Giandomenico Cornacchia
Mark Purcell
31
0
0
08 Mar 2025
Shh, don't say that! Domain Certification in LLMs
Shh, don't say that! Domain Certification in LLMs
Cornelius Emde
Alasdair Paren
Preetham Arvind
Maxime Kayser
Tom Rainforth
Thomas Lukasiewicz
Bernard Ghanem
Philip H. S. Torr
Adel Bibi
43
1
0
26 Feb 2025
On the Role of Attention Heads in Large Language Model Safety
On the Role of Attention Heads in Large Language Model Safety
Z. Zhou
Haiyang Yu
Xinghua Zhang
Rongwu Xu
Fei Huang
Kun Wang
Yang Liu
Junfeng Fang
Yongbin Li
43
5
0
17 Oct 2024
Perception-guided Jailbreak against Text-to-Image Models
Perception-guided Jailbreak against Text-to-Image Models
Yihao Huang
Le Liang
Tianlin Li
Xiaojun Jia
Run Wang
Weikai Miao
G. Pu
Yang Liu
33
6
0
20 Aug 2024
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
Xunguang Wang
Daoyuan Wu
Zhenlan Ji
Zongjie Li
Pingchuan Ma
Shuai Wang
Yingjiu Li
Yang Liu
Ning Liu
Juergen Rahmel
AAML
57
6
0
08 Jun 2024
Responsible Generative AI: What to Generate and What Not
Responsible Generative AI: What to Generate and What Not
Jindong Gu
16
8
0
08 Apr 2024
EasyJailbreak: A Unified Framework for Jailbreaking Large Language
  Models
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
Weikang Zhou
Xiao Wang
Limao Xiong
Han Xia
Yingshuang Gu
...
Lijun Li
Jing Shao
Tao Gui
Qi Zhang
Xuanjing Huang
71
29
0
18 Mar 2024
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM
  Agents Exponentially Fast
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Xiangming Gu
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Ye Wang
Jing Jiang
Min-Bin Lin
LLMAG
LM&Ro
35
47
0
13 Feb 2024
On the Multi-modal Vulnerability of Diffusion Models
On the Multi-modal Vulnerability of Diffusion Models
Dingcheng Yang
Yang Bai
Xiaojun Jia
Yang Liu
Xiaochun Cao
Wenjian Yu
34
11
0
02 Feb 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated
  Jailbreak Prompts
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
110
292
0
19 Sep 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Gradient-based Adversarial Attacks against Text Transformers
Gradient-based Adversarial Attacks against Text Transformers
Chuan Guo
Alexandre Sablayrolles
Hervé Jégou
Douwe Kiela
SILM
93
225
0
15 Apr 2021
1