ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.07125
  4. Cited By
Universal Adversarial Triggers for Attacking and Analyzing NLP
v1v2v3 (latest)

Universal Adversarial Triggers for Attacking and Analyzing NLP

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
20 August 2019
Eric Wallace
Shi Feng
Nikhil Kandpal
Matt Gardner
Sameer Singh
    AAMLSILM
ArXiv (abs)PDFHTML

Papers citing "Universal Adversarial Triggers for Attacking and Analyzing NLP"

50 / 662 papers shown
SUCEA: Reasoning-Intensive Retrieval for Adversarial Fact-checking through Claim Decomposition and Editing
Hongjun Liu
Yilun Zhao
Arman Cohan
Chen Zhao
AAMLLRM
275
0
0
05 Jun 2025
Normative Conflicts and Shallow AI AlignmentPhilosophical Studies (Philos. Stud.), 2025
Raphaël Millière
251
3
0
05 Jun 2025
Shaking to Reveal: Perturbation-Based Detection of LLM Hallucinations
Shaking to Reveal: Perturbation-Based Detection of LLM Hallucinations
Jinyuan Luo
Zhen Fang
Shouqing Yang
Seongheon Park
Ling Chen
AAMLHILM
226
0
0
03 Jun 2025
SafeGenes: Evaluating the Adversarial Robustness of Genomic Foundation Models
SafeGenes: Evaluating the Adversarial Robustness of Genomic Foundation Models
Huixin Zhan
Clovis Barbour
Jason H. Moore
AAML
132
0
0
01 Jun 2025
A Red Teaming Roadmap Towards System-Level Safety
A Red Teaming Roadmap Towards System-Level Safety
Zifan Wang
Christina Q. Knight
Jeremy Kritz
Willow Primack
Julian Michael
AAML
305
1
0
30 May 2025
Learning Safety Constraints for Large Language Models
Learning Safety Constraints for Large Language Models
Xin Chen
Yarden As
Andreas Krause
178
7
0
30 May 2025
Towards Secure MLOps: Surveying Attacks, Mitigation Strategies, and Research Challenges
Towards Secure MLOps: Surveying Attacks, Mitigation Strategies, and Research Challenges
Raj Patel
Himanshu Tripathi
Jasper Stone
Noorbakhsh Amiri Golilarz
Sudip Mittal
Shahram Rahimi
Vini Chaudhary
AAML
191
3
0
30 May 2025
TRAPDOC: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents
TRAPDOC: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents
Hyundong Jin
Sicheol Sung
Shinwoo Park
SeungYeop Baik
Yo-Sub Han
259
3
0
30 May 2025
Lifelong Safety Alignment for Language Models
Lifelong Safety Alignment for Language Models
Haoyu Wang
Zeyu Qin
Yifei Zhao
C. Du
Min Lin
Xueqian Wang
Tianyu Pang
KELMCLL
292
6
0
26 May 2025
GhostPrompt: Jailbreaking Text-to-image Generative Models based on Dynamic Optimization
GhostPrompt: Jailbreaking Text-to-image Generative Models based on Dynamic Optimization
Zixuan Chen
Hao Lin
Ke Xu
Xinghao Jiang
Tanfeng Sun
203
0
0
25 May 2025
Security Concerns for Large Language Models: A Survey
Security Concerns for Large Language Models: A Survey
Miles Q. Li
Benjamin C. M. Fung
PILMELM
796
18
0
24 May 2025
Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards
Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards
Punya Syon Pandey
Samuel Simko
Kellin Pelrine
Zhijing Jin
AAML
263
0
0
22 May 2025
MixAT: Combining Continuous and Discrete Adversarial Training for LLMs
MixAT: Combining Continuous and Discrete Adversarial Training for LLMs
Csaba Dékány
Stefan Balauca
Robin Staab
Dimitar I. Dimitrov
Martin Vechev
AAML
303
1
0
22 May 2025
SPECTRE: Conditional System Prompt Poisoning to Hijack LLMs
SPECTRE: Conditional System Prompt Poisoning to Hijack LLMs
Viet Pham
Thai Le
SILM
180
0
0
22 May 2025
Adversarially Pretrained Transformers May Be Universally Robust In-Context Learners
Adversarially Pretrained Transformers May Be Universally Robust In-Context Learners
Soichiro Kumano
Hiroshi Kera
Toshihiko Yamasaki
AAML
549
1
0
20 May 2025
Chain-of-Thought Driven Adversarial Scenario Extrapolation for Robust Language Models
Chain-of-Thought Driven Adversarial Scenario Extrapolation for Robust Language Models
Md Rafi Ur Rashid
Vishnu Asutosh Dasu
Ye Wang
Gang Tan
Shagufta Mehnaz
AAMLELM
411
0
0
20 May 2025
Investigating the Vulnerability of LLM-as-a-Judge Architectures to Prompt-Injection Attacks
Investigating the Vulnerability of LLM-as-a-Judge Architectures to Prompt-Injection Attacks
Narek Maloyan
Bislan Ashinov
Dmitry Namiot
AAMLELM
240
7
0
19 May 2025
Distribution Prompting: Understanding the Expressivity of Language Models Through the Next-Token Distributions They Can Produce
Distribution Prompting: Understanding the Expressivity of Language Models Through the Next-Token Distributions They Can Produce
Haojin Wang
Zining Zhu
Freda Shi
279
1
0
18 May 2025
SPIRIT: Patching Speech Language Models against Jailbreak Attacks
SPIRIT: Patching Speech Language Models against Jailbreak Attacks
Amirbek Djanibekov
Nurdaulet Mukhituly
Kentaro Inui
Hanan Aldarmaki
Nils Lukas
AAML
297
1
0
18 May 2025
Characterizing the Robustness of Black-Box LLM Planners Under Perturbed Observations with Adaptive Stress Testing
Characterizing the Robustness of Black-Box LLM Planners Under Perturbed Observations with Adaptive Stress Testing
Neeloy Chakraborty
John Pohovey
Melkior Ornik
Katherine Driggs-Campbell
356
0
0
08 May 2025
Adversarial Attacks in Multimodal Systems: A Practitioner's Survey
Adversarial Attacks in Multimodal Systems: A Practitioner's SurveyAnnual International Computer Software and Applications Conference (COMPSAC), 2025
Shashank Kapoor
Sanjay Surendranath Girija
Lakshit Arora
Dipen Pradhan
Ankit Shetgaonkar
Aman Raj
AAML
512
2
0
06 May 2025
Semantic Probabilistic Control of Language Models
Semantic Probabilistic Control of Language Models
Kareem Ahmed
Catarina G Belém
Padhraic Smyth
Sameer Singh
306
4
0
04 May 2025
Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs
Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs
Haoming Yang
Ke Ma
Xiaojun Jia
Yingfei Sun
Qianqian Xu
Qingming Huang
AAML
1.1K
4
0
03 May 2025
LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures
LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures
Francisco Aguilera-Martínez
Fernando Berzal
PILM
391
10
0
02 May 2025
Attack and defense techniques in large language models: A survey and new perspectives
Attack and defense techniques in large language models: A survey and new perspectives
Zhiyu Liao
Kang Chen
Yuanguo Lin
Kangkang Li
Yunxuan Liu
Hefeng Chen
Xingwang Huang
Yuanhui Yu
AAML
285
3
0
02 May 2025
OET: Optimization-based prompt injection Evaluation Toolkit
OET: Optimization-based prompt injection Evaluation Toolkit
Jinsheng Pan
Xiaogeng Liu
Chaowei Xiao
AAML
341
0
0
01 May 2025
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask SupervisionInternational Conference on Learning Representations (ICLR), 2025
Weicai Yan
Wang Lin
Zirun Guo
Ye Wang
Fangming Feng
Xiaoda Yang
Liang Luo
Tao Jin
DiffM
661
6
0
30 Apr 2025
Graph of Attacks: Improved Black-Box and Interpretable Jailbreaks for LLMs
Graph of Attacks: Improved Black-Box and Interpretable Jailbreaks for LLMs
Mohammad Akbar-Tajari
Mohammad Taher Pilehvar
Mohammad Mahmoody
AAML
207
1
0
26 Apr 2025
GraphAttack: Exploiting Representational Blindspots in LLM Safety Mechanisms
GraphAttack: Exploiting Representational Blindspots in LLM Safety Mechanisms
Sinan He
An Wang
165
0
0
17 Apr 2025
QAVA: Query-Agnostic Visual Attack to Large Vision-Language Models
QAVA: Query-Agnostic Visual Attack to Large Vision-Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Yudong Zhang
Ruobing Xie
Jiansheng Chen
Xingwu Sun
Zhanhui Kang
Yu Wang
AAML
242
3
0
15 Apr 2025
NLP Security and Ethics, in the Wild
NLP Security and Ethics, in the WildTransactions of the Association for Computational Linguistics (TACL), 2025
Heather Lent
Erick Galinkin
Yiyi Chen
Jens Myrup Pedersen
Leon Derczynski
Johannes Bjerva
SILM
413
1
0
09 Apr 2025
Revealing the Intrinsic Ethical Vulnerability of Aligned Large Language Models
Revealing the Intrinsic Ethical Vulnerability of Aligned Large Language Models
Jiawei Lian
Jianhong Pan
L. Wang
Yi Wang
Shaohui Mei
Lap-Pui Chau
AAML
679
1
0
07 Apr 2025
On the Robustness of GUI Grounding Models Against Image Attacks
On the Robustness of GUI Grounding Models Against Image Attacks
Haoren Zhao
Tianyi Chen
Zhen Wang
AAML
289
6
0
07 Apr 2025
Encrypted Prompt: Securing LLM Applications Against Unauthorized Actions
Encrypted Prompt: Securing LLM Applications Against Unauthorized Actions
Shih-Han Chan
AAML
209
5
0
29 Mar 2025
MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks
MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks
Wenhao You
Bryan Hooi
Yiwei Wang
Longji Xu
Zong Ke
Ming Yang
Zi Huang
Yujun Cai
AAML
367
4
0
24 Mar 2025
In-House Evaluation Is Not Enough: Towards Robust Third-Party Flaw Disclosure for General-Purpose AI
In-House Evaluation Is Not Enough: Towards Robust Third-Party Flaw Disclosure for General-Purpose AI
Shayne Longpre
Kevin Klyman
Ruth E. Appel
Sayash Kapoor
Rishi Bommasani
...
Victoria Westerhoff
Yacine Jernite
Rumman Chowdhury
Percy Liang
Arvind Narayanan
ELM
391
8
0
21 Mar 2025
Prompt Injection Detection and Mitigation via AI Multi-Agent NLP Frameworks
Diego Gosmar
Deborah A. Dahl
Dario Gosmar
AAML
219
9
0
14 Mar 2025
CtrlRAG: Black-box Adversarial Attacks Based on Masked Language Models in Retrieval-Augmented Language Generation
Runqi Sui
AAML
210
3
0
10 Mar 2025
Life-Cycle Routing Vulnerabilities of LLM Router
Qiqi Lin
Xiaoyang Ji
Shengfang Zhai
Qingni Shen
Zhi-Li Zhang
Yuejian Fang
Yansong Gao
AAML
255
3
0
09 Mar 2025
Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation
Wenhui Zhang
Huiyu Xu
Peng Kuang
Zeqing He
Ziqi Zhu
Kui Ren
AAMLPILM
226
3
0
09 Mar 2025
Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models
Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models
Meghana Arakkal Rajeev
Rajkumar Ramamurthy
Prapti Trivedi
Vikas Yadav
Oluwanifemi Bamgbose
Sathwik Tejaswi Madhusudan
James Zou
Nazneen Rajani
AAMLLRM
321
12
0
03 Mar 2025
Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models
Alberto Purpura
Sahil Wadhwa
Jesse Zymet
Akshay Gupta
Andy Luo
Melissa Kazemi Rad
Swapnil Shinde
Mohammad Sorower
AAML
997
5
0
03 Mar 2025
UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning
UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning
J.N. Zhang
Shuang Yang
B. Li
AAMLLLMAG
356
7
0
28 Feb 2025
PolyPrompt: Automating Knowledge Extraction from Multilingual Language Models with Dynamic Prompt Generation
PolyPrompt: Automating Knowledge Extraction from Multilingual Language Models with Dynamic Prompt Generation
Nathan Roll
303
2
0
27 Feb 2025
Shh, don't say that! Domain Certification in LLMs
Shh, don't say that! Domain Certification in LLMsInternational Conference on Learning Representations (ICLR), 2025
Cornelius Emde
Alasdair Paren
Preetham Arvind
Maxime Kayser
Tom Rainforth
Thomas Lukasiewicz
Guohao Li
Juil Sock
Adel Bibi
348
4
0
26 Feb 2025
REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective
Simon Geisler
Tom Wollschlager
M. H. I. Abdalla
Vincent Cohen-Addad
Johannes Gasteiger
Stephan Günnemann
AAML
277
8
0
24 Feb 2025
Attention Eclipse: Manipulating Attention to Bypass LLM Safety-Alignment
Attention Eclipse: Manipulating Attention to Bypass LLM Safety-Alignment
Pedram Zaree
Md Abdullah Al Mamun
Quazi Mishkatul Alam
Yue Dong
Ihsen Alouani
Nael B. Abu-Ghazaleh
AAML
258
2
0
24 Feb 2025
Rethinking the Vulnerability of Concept Erasure and a New Method
Rethinking the Vulnerability of Concept Erasure and a New Method
Alex D. Richardson
Alex D. Richardson
Lucas Beerens
Dongdong Chen
DiffM
698
3
0
24 Feb 2025
Interrogating LLM design under a fair learning doctrine
Interrogating LLM design under a fair learning doctrine
Johnny Tian-Zheng Wei
Maggie Wang
Ameya Godbole
Jonathan H. Choi
Robin Jia
299
0
0
22 Feb 2025
Eliminating Backdoors in Neural Code Models for Secure Code Understanding
Eliminating Backdoors in Neural Code Models for Secure Code Understanding
Weisong Sun
Yuchen Chen
Chunrong Fang
Yebo Feng
Yuan Xiao
An Guo
Quanjun Zhang
Yang Liu
Baowen Xu
Zhenyu Chen
AAML
343
1
0
21 Feb 2025
Previous
12345...121314
Next