ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.09154
  4. Cited By
Attacking Large Language Models with Projected Gradient Descent
v1v2 (latest)

Attacking Large Language Models with Projected Gradient Descent

14 February 2024
Simon Geisler
Tom Wollschlager
M. H. I. Abdalla
Johannes Gasteiger
Stephan Günnemann
    AAMLSILM
ArXiv (abs)PDFHTMLGithub (22★)

Papers citing "Attacking Large Language Models with Projected Gradient Descent"

50 / 66 papers shown
TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization
TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization
Yanting Wang
Runpeng Geng
Jinghui Chen
Minhao Cheng
Jinyuan Jia
336
0
0
23 Nov 2025
Gradient Masters at BLP-2025 Task 1: Advancing Low-Resource NLP for Bengali using Ensemble-Based Adversarial Training for Hate Speech Detection
Gradient Masters at BLP-2025 Task 1: Advancing Low-Resource NLP for Bengali using Ensemble-Based Adversarial Training for Hate Speech Detection
Syed Mohaiminul Hoque
Naimur Rahman
Md Sakhawat Hossain
120
0
0
23 Nov 2025
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs
Yunhao Chen
Xin Wang
Juncheng Li
Yixu Wang
Jie Li
Yan Teng
Yingchun Wang
Xingjun Ma
AAML
333
1
0
16 Nov 2025
AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research
AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research
Tim Beyer
Jonas Dornbusch
Jakob Steimle
Moritz Ladenburger
Leo Schwinn
Stephan Günnemann
AAML
282
2
0
06 Nov 2025
ASTRA: An Automated Framework for Strategy Discovery, Retrieval, and Evolution for Jailbreaking LLMs
ASTRA: An Automated Framework for Strategy Discovery, Retrieval, and Evolution for Jailbreaking LLMs
Xu Liu
Yan Chen
Kan Ling
Yichi Zhu
Hengrun Zhang
Guisheng Fan
Huiqun Yu
AAML
168
2
0
04 Nov 2025
Align to Misalign: Automatic LLM Jailbreak with Meta-Optimized LLM Judges
Align to Misalign: Automatic LLM Jailbreak with Meta-Optimized LLM Judges
Hamin Koo
Minseon Kim
Jaehyung Kim
160
1
0
03 Nov 2025
Diffusion LLMs are Natural Adversaries for any LLM
Diffusion LLMs are Natural Adversaries for any LLM
David Lüdke
Tom Wollschlager
Paul Ungermann
Stephan Günnemann
Leo Schwinn
DiffM
261
3
0
31 Oct 2025
Toward Understanding the Transferability of Adversarial Suffixes in Large Language Models
Toward Understanding the Transferability of Adversarial Suffixes in Large Language Models
Sarah Ball
Niki Hasrati
Alexander Robey
Avi Schwarzschild
Frauke Kreuter
Zico Kolter
Andrej Risteski
AAML
344
0
0
24 Oct 2025
SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models
SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models
Hanbin Hong
Shuya Feng
Nima Naderloui
Shenao Yan
Jingyu Zhang
Biying Liu
Ali Arastehfard
Heqing Huang
Yuan Hong
AAML
312
2
0
17 Oct 2025
CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biases in LLMs
CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biases in LLMs
Nafiseh Nikeghbal
Amir Hossein Kargaran
Jana Diesner
204
0
0
10 Oct 2025
AgentTypo: Adaptive Typographic Prompt Injection Attacks against Black-box Multimodal Agents
AgentTypo: Adaptive Typographic Prompt Injection Attacks against Black-box Multimodal Agents
Yanjie Li
Yiming Cao
Dong Wang
Bin Xiao
LLMAGAAML
199
1
0
05 Oct 2025
Machine Unlearning Meets Adversarial Robustness via Constrained Interventions on LLMs
Machine Unlearning Meets Adversarial Robustness via Constrained Interventions on LLMs
Fatmazohra Rezkellah
Ramzi Dakhmouche
AAMLMU
283
2
0
03 Oct 2025
Bypassing Prompt Guards in Production with Controlled-Release Prompting
Bypassing Prompt Guards in Production with Controlled-Release Prompting
Jaiden Fairoze
Sanjam Garg
Keewoo Lee
Mingyuan Wang
SILMAAML
311
1
0
02 Oct 2025
Semantic Representation Attack against Aligned Large Language Models
Semantic Representation Attack against Aligned Large Language Models
Jiawei Lian
Jianhong Pan
L. Wang
Yi Wang
Shaohui Mei
Lap-Pui Chau
287
2
0
18 Sep 2025
On Surjectivity of Neural Networks: Can you elicit any behavior from your model?
On Surjectivity of Neural Networks: Can you elicit any behavior from your model?
Haozhe Jiang
Nika Haghtalab
285
3
0
26 Aug 2025
Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent
Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent
Sajib Biswas
Mao Nishino
Samuel Jacob Chacko
Xiuwen Liu
AAML
227
2
0
20 Aug 2025
Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
Huizhen Shu
Xuying Li
Qirui Wang
Yuji Kosuga
Mengqiu Tian
Zhuo Li
AAMLSILM
281
1
0
14 Aug 2025
Attention-Aware GNN-based Input Defense against Multi-Turn LLM Jailbreak
Attention-Aware GNN-based Input Defense against Multi-Turn LLM Jailbreak
Zixuan Huang
Kecheng Huang
Lihao Yin
Bowei He
Huiling Zhen
Mingxuan Yuan
Zili Shao
AAML
456
0
0
09 Jul 2025
FORTRESS: Frontier Risk Evaluation for National Security and Public Safety
FORTRESS: Frontier Risk Evaluation for National Security and Public Safety
Christina Q. Knight
Kaustubh Deshpande
Ved Sirdeshmukh
Meher Mankikar
Scale Red Team
SEAL Research Team
Julian Michael
AAMLELM
381
8
0
17 Jun 2025
Improving Large Language Model Safety with Contrastive Representation Learning
Improving Large Language Model Safety with Contrastive Representation Learning
Samuel Simko
Mrinmaya Sachan
Bernhard Schölkopf
Zhijing Jin
AAML
420
4
0
13 Jun 2025
DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt
DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt
Yitong Zhang
Jia Li
L. Cai
Ge Li
VLM
435
3
0
11 Jun 2025
Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations
Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations
Zhiyu Xue
Reza Abbasi-Asl
Ramtin Pedarsani
169
2
0
08 Jun 2025
Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models
Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models
Rui Cai
Bangzheng Li
Xiaofei Wen
Muhao Chen
Zhe Zhao
296
0
0
26 May 2025
Benign-to-Toxic Jailbreaking: Inducing Harmful Responses from Harmless Prompts
Benign-to-Toxic Jailbreaking: Inducing Harmful Responses from Harmless Prompts
H. Kim
Minbeom Kim
Wonjun Lee
Kihyun Kim
Changick Kim
215
0
0
26 May 2025
Lifelong Safety Alignment for Language Models
Lifelong Safety Alignment for Language Models
Haoyu Wang
Zeyu Qin
Yifei Zhao
C. Du
Min Lin
Xueqian Wang
Tianyu Pang
KELMCLL
395
7
0
26 May 2025
Does Representation Intervention Really Identify Desired Concepts and Elicit Alignment?
Does Representation Intervention Really Identify Desired Concepts and Elicit Alignment?
Hongzheng Yang
Yongqiang Chen
Zeyu Qin
Tongliang Liu
Chaowei Xiao
Kun Zhang
Bo Han
LLMSV
263
0
0
24 May 2025
OET: Optimization-based prompt injection Evaluation Toolkit
OET: Optimization-based prompt injection Evaluation Toolkit
Jinsheng Pan
Xiaogeng Liu
Chaowei Xiao
AAML
387
0
0
01 May 2025
Edge-Based Learning for Improved Classification Under Adversarial Noise
Edge-Based Learning for Improved Classification Under Adversarial Noise
Manish Kansana
Keyan Alexander Rahimi
Elias Hossain
Iman Dehzangi
Noorbakhsh Amiri Golilarz
AAML
223
0
0
25 Apr 2025
LLM-Safety Evaluations Lack Robustness
LLM-Safety Evaluations Lack Robustness
Tim Beyer
Sophie Xhonneux
Simon Geisler
Gauthier Gidel
Leo Schwinn
Stephan Günnemann
ALMELM
1.1K
14
0
04 Mar 2025
Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models
Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models
Alberto Purpura
Sahil Wadhwa
Jesse Zymet
Akshay Gupta
Andy Luo
Melissa Kazemi Rad
Swapnil Shinde
Mohammad Sorower
AAML
1.1K
11
0
03 Mar 2025
Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness
Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness
Tingchen Fu
Fazl Barez
AAML
385
3
0
03 Mar 2025
Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks
Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks
Hanjiang Hu
Alexander Robey
Changliu Liu
AAMLLLMSV
557
15
0
28 Feb 2025
REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective
REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective
Simon Geisler
Tom Wollschlager
M. H. I. Abdalla
Vincent Cohen-Addad
Johannes Gasteiger
Stephan Günnemann
AAML
309
12
0
24 Feb 2025
The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence
The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence
Tom Wollschlager
Jannes Elstner
Simon Geisler
Vincent Cohen-Addad
Stephan Günnemann
Johannes Gasteiger
LLMSV
365
44
0
24 Feb 2025
Safety Reasoning with Guidelines
Safety Reasoning with Guidelines
Haoyu Wang
Zeyu Qin
Li Shen
Xueqian Wang
Minhao Cheng
Dacheng Tao
537
4
0
06 Feb 2025
You Can't Eat Your Cake and Have It Too: The Performance Degradation of LLMs with Jailbreak Defense
You Can't Eat Your Cake and Have It Too: The Performance Degradation of LLMs with Jailbreak DefenseThe Web Conference (WWW), 2025
Wuyuao Mai
Geng Hong
Pei Chen
Xudong Pan
Baojun Liu
Y. Zhang
Haixin Duan
Min Yang
AAML
386
8
0
21 Jan 2025
DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak
DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak
Hao Wang
Hao Li
Junda Zhu
Xinyuan Wang
Changzai Pan
Shiyu Huang
Lei Sha
746
8
0
23 Dec 2024
PrisonBreak: Jailbreaking Large Language Models with at Most Twenty-Five Targeted Bit-flips
PrisonBreak: Jailbreaking Large Language Models with at Most Twenty-Five Targeted Bit-flips
Zachary Coalson
Jeonghyun Woo
Shiyang Chen
Yu Sun
Yu Sun
...
Lishan Yang
Gururaj Saileshwar
Prashant J. Nair
Bo Fang
Sanghyun Hong
AAML
665
8
0
10 Dec 2024
Rethinking the Intermediate Features in Adversarial Attacks: Misleading
  Robotic Models via Adversarial Distillation
Rethinking the Intermediate Features in Adversarial Attacks: Misleading Robotic Models via Adversarial Distillation
Ke Zhao
Huayang Huang
Miao Li
Yu Wu
AAML
349
2
0
21 Nov 2024
Plentiful Jailbreaks with String Compositions
Plentiful Jailbreaks with String Compositions
Brian R. Y. Huang
AAML
525
3
0
01 Nov 2024
Emoji Attack: Enhancing Jailbreak Attacks Against Judge LLM Detection
Emoji Attack: Enhancing Jailbreak Attacks Against Judge LLM Detection
Zhipeng Wei
Yuqi Liu
N. Benjamin Erichson
AAML
314
1
0
01 Nov 2024
Adversarial Attacks on Large Language Models Using Regularized
  Relaxation
Adversarial Attacks on Large Language Models Using Regularized Relaxation
Samuel Jacob Chacko
Sajib Biswas
Chashi Mahiul Islam
Fatema Tabassum Liza
Xiuwen Liu
AAML
282
10
0
24 Oct 2024
AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention
  Manipulation
AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
Zijun Wang
Haoqin Tu
J. Mei
Bingchen Zhao
Yanjie Wang
Cihang Xie
230
23
0
11 Oct 2024
JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent
  Enhanced Explanation Evaluation Framework
JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent Enhanced Explanation Evaluation Framework
Fan Liu
Yue Feng
Zhao Xu
Lixin Su
Xinyu Ma
D. Yin
Hao Liu
ELM
367
40
0
11 Oct 2024
Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
Maya Pavlova
Erik Brinkman
Krithika Iyer
Vítor Albiero
Joanna Bitton
Hailey Nguyen
Haibin Zhang
Cristian Canton Ferrer
Ivan Evtimov
Aaron Grattafiori
ALM
358
36
0
02 Oct 2024
FlipAttack: Jailbreak LLMs via Flipping
FlipAttack: Jailbreak LLMs via Flipping
Yue Liu
Xiaoxin He
Miao Xiong
Jinlan Fu
Shumin Deng
Bryan Hooi
AAML
282
55
0
02 Oct 2024
Endless Jailbreaks with Bijection Learning
Endless Jailbreaks with Bijection LearningInternational Conference on Learning Representations (ICLR), 2024
Brian R. Y. Huang
Maximilian Li
Leonard Tang
AAML
410
16
0
02 Oct 2024
Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
Bang An
Sicheng Zhu
Ruiyi Zhang
Michael-Andrei Panaitescu-Liess
Yuancheng Xu
Furong Huang
AAML
498
34
0
01 Sep 2024
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
Nathaniel Li
Ziwen Han
Ian Steneker
Willow Primack
Riley Goodside
Hugh Zhang
Zifan Wang
Cristina Menghini
Summer Yue
AAMLMU
409
133
0
27 Aug 2024
Discrete Randomized Smoothing Meets Quantum Computing
Discrete Randomized Smoothing Meets Quantum ComputingInternational Conference on Quantum Computing and Engineering (QCE), 2024
Md. Nazmus Sakib
Aman Saxena
Nicola Franco
Md Mashrur Arifin
Stephan Günnemann
AAML
281
2
0
01 Aug 2024
12
Next
Page 1 of 2