ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.13733
  4. Cited By
Gradient-based Adversarial Attacks against Text Transformers

Gradient-based Adversarial Attacks against Text Transformers

15 April 2021
Chuan Guo
Alexandre Sablayrolles
Hervé Jégou
Douwe Kiela
    SILM
ArXivPDFHTML

Papers citing "Gradient-based Adversarial Attacks against Text Transformers"

15 / 15 papers shown
Title
LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures
LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures
Francisco Aguilera-Martínez
Fernando Berzal
PILM
45
0
0
02 May 2025
Graph of Attacks: Improved Black-Box and Interpretable Jailbreaks for LLMs
Graph of Attacks: Improved Black-Box and Interpretable Jailbreaks for LLMs
Mohammad Akbar-Tajari
Mohammad Taher Pilehvar
Mohammad Mahmoody
AAML
44
0
0
26 Apr 2025
Revealing the Intrinsic Ethical Vulnerability of Aligned Large Language Models
Revealing the Intrinsic Ethical Vulnerability of Aligned Large Language Models
Jiawei Lian
Jianhong Pan
L. Wang
Yi Wang
Shaohui Mei
Lap-Pui Chau
AAML
19
0
0
07 Apr 2025
Single-pass Detection of Jailbreaking Input in Large Language Models
Single-pass Detection of Jailbreaking Input in Large Language Models
Leyla Naz Candogan
Yongtao Wu
Elias Abad Rocamora
Grigorios G. Chrysos
V. Cevher
AAML
40
0
0
24 Feb 2025
Universal Adversarial Attack on Aligned Multimodal LLMs
Universal Adversarial Attack on Aligned Multimodal LLMs
Temurbek Rahmatullaev
Polina Druzhinina
Matvey Mikhalchuk
Andrey Kuznetsov
Anton Razzhigaev
AAML
91
0
0
11 Feb 2025
FIT-Print: Towards False-claim-resistant Model Ownership Verification via Targeted Fingerprint
Shuo Shao
Haozhe Zhu
Hongwei Yao
Yiming Li
Tianwei Zhang
Z. Qin
Kui Ren
41
0
0
28 Jan 2025
DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak
DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak
Hao Wang
Hao Li
Junda Zhu
Xinyuan Wang
C. Pan
Minlie Huang
Lei Sha
38
0
0
23 Dec 2024
Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context
Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context
Nilanjana Das
Edward Raff
Manas Gaur
AAML
96
1
0
20 Dec 2024
In-Context Experience Replay Facilitates Safety Red-Teaming of Text-to-Image Diffusion Models
In-Context Experience Replay Facilitates Safety Red-Teaming of Text-to-Image Diffusion Models
Zhi-Yi Chin
Kuan-Chen Mu
Mario Fritz
Pin-Yu Chen
DiffM
73
0
0
25 Nov 2024
Deciphering the Chaos: Enhancing Jailbreak Attacks via Adversarial Prompt Translation
Deciphering the Chaos: Enhancing Jailbreak Attacks via Adversarial Prompt Translation
Qizhang Li
Xiaochen Yang
W. Zuo
Yiwen Guo
AAML
51
0
0
15 Oct 2024
"Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jailbreak
"Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jailbreak
Lingrui Mei
Shenghua Liu
Yiwei Wang
Baolong Bi
Jiayi Mao
Xueqi Cheng
AAML
18
9
0
17 Jun 2024
SoK: Leveraging Transformers for Malware Analysis
SoK: Leveraging Transformers for Malware Analysis
Pradip Kunwar
Kshitiz Aryal
Maanak Gupta
Mahmoud Abdelsalam
Elisa Bertino
74
0
0
27 May 2024
ImgTrojan: Jailbreaking Vision-Language Models with ONE Image
ImgTrojan: Jailbreaking Vision-Language Models with ONE Image
Xijia Tao
Shuai Zhong
Lei Li
Qi Liu
Lingpeng Kong
20
25
0
05 Mar 2024
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Leo Schwinn
David Dobre
Sophie Xhonneux
Gauthier Gidel
Stephan Gunnemann
AAML
23
36
0
14 Feb 2024
Generating Natural Language Adversarial Examples
Generating Natural Language Adversarial Examples
M. Alzantot
Yash Sharma
Ahmed Elgohary
Bo-Jhang Ho
Mani B. Srivastava
Kai-Wei Chang
AAML
228
909
0
21 Apr 2018
1