ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.07125
  4. Cited By
Universal Adversarial Triggers for Attacking and Analyzing NLP
v1v2v3 (latest)

Universal Adversarial Triggers for Attacking and Analyzing NLP

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
20 August 2019
Eric Wallace
Shi Feng
Nikhil Kandpal
Matt Gardner
Sameer Singh
    AAMLSILM
ArXiv (abs)PDFHTML

Papers citing "Universal Adversarial Triggers for Attacking and Analyzing NLP"

50 / 662 papers shown
Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models
Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models
Z. Wang
Jie M. Zhang
Shiguang Shan
Xilin Chen
AAML
373
0
0
29 Nov 2025
TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization
TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization
Yanting Wang
Runpeng Geng
Jinghui Chen
Minhao Cheng
Jinyuan Jia
297
0
0
23 Nov 2025
PARROT: Persuasion and Agreement Robustness Rating of Output Truth -- A Sycophancy Robustness Benchmark for LLMs
PARROT: Persuasion and Agreement Robustness Rating of Output Truth -- A Sycophancy Robustness Benchmark for LLMs
Yusuf Çelebi
Mahmoud El Hussieni
Özay Ezerceli
AAML
274
0
0
21 Nov 2025
SteganoBackdoor: Stealthy and Data-Efficient Backdoor Attacks on Language Models
SteganoBackdoor: Stealthy and Data-Efficient Backdoor Attacks on Language Models
Eric Xue
Ruiyi Zhang
Zijun Zhang
AAML
154
0
0
18 Nov 2025
Training Language Models to Explain Their Own Computations
Training Language Models to Explain Their Own Computations
Belinda Z. Li
Zifan Carl Guo
Vincent Huang
Jacob Steinhardt
Jacob Andreas
LRM
231
3
0
11 Nov 2025
AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models
AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models
Aashray Reddy
Andrew Zagula
Nicholas Saban
AAMLMUSILM
614
3
0
04 Nov 2025
"Give a Positive Review Only": An Early Investigation Into In-Paper Prompt Injection Attacks and Defenses for AI Reviewers
"Give a Positive Review Only": An Early Investigation Into In-Paper Prompt Injection Attacks and Defenses for AI Reviewers
Qin Zhou
Zhexin Zhang
Zhi Li
Limin Sun
AAML
134
1
0
03 Nov 2025
NeuroGenPoisoning: Neuron-Guided Attacks on Retrieval-Augmented Generation of LLM via Genetic Optimization of External Knowledge
NeuroGenPoisoning: Neuron-Guided Attacks on Retrieval-Augmented Generation of LLM via Genetic Optimization of External Knowledge
Hanyu Zhu
Lance Fiondella
Jiawei Yuan
K. Zeng
Long Jiao
SILMAAMLKELM
278
0
0
24 Oct 2025
Toward Understanding the Transferability of Adversarial Suffixes in Large Language Models
Toward Understanding the Transferability of Adversarial Suffixes in Large Language Models
Sarah Ball
Niki Hasrati
Alexander Robey
Avi Schwarzschild
Frauke Kreuter
Zico Kolter
Andrej Risteski
AAML
297
0
0
24 Oct 2025
Can Transformer Memory Be Corrupted? Investigating Cache-Side Vulnerabilities in Large Language Models
Can Transformer Memory Be Corrupted? Investigating Cache-Side Vulnerabilities in Large Language Models
Elias Hossain
Swayamjit Saha
Somshubhra Roy
Ravi Prasad
177
2
0
20 Oct 2025
Online Learning Defense against Iterative Jailbreak Attacks via Prompt Optimization
Online Learning Defense against Iterative Jailbreak Attacks via Prompt Optimization
Masahiro Kaneko
Zeerak Talat
Timothy Baldwin
AAML
147
2
0
19 Oct 2025
SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models
SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models
Hanbin Hong
Shuya Feng
Nima Naderloui
Shenao Yan
Jingyu Zhang
Biying Liu
Ali Arastehfard
Heqing Huang
Yuan Hong
AAML
235
0
0
17 Oct 2025
Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers
Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers
Andrew Zhao
Reshmi Ghosh
Vitor Carvalho
Emily Lawton
Keegan Hines
Gao Huang
Jack W. Stokes
AAMLSILM
247
1
0
16 Oct 2025
Selective Adversarial Attacks on LLM Benchmarks
Selective Adversarial Attacks on LLM Benchmarks
Ivan Dubrovsky
Anastasia Orlova
Illarion Iov
Nina Gubina
Irena Gureeva
Alexey Zaytsev
AAML
122
0
0
15 Oct 2025
In-Browser LLM-Guided Fuzzing for Real-Time Prompt Injection Testing in Agentic AI Browsers
In-Browser LLM-Guided Fuzzing for Real-Time Prompt Injection Testing in Agentic AI Browsers
Avihay Cohen
SILMLLMAGAI4CE
209
1
0
15 Oct 2025
CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biases in LLMs
CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biases in LLMs
Nafiseh Nikeghbal
Amir Hossein Kargaran
Jana Diesner
151
0
0
10 Oct 2025
GREAT: Generalizable Backdoor Attacks in RLHF via Emotion-Aware Trigger Synthesis
GREAT: Generalizable Backdoor Attacks in RLHF via Emotion-Aware Trigger Synthesis
Subrat Kishore Dutta
Yuelin Xu
Piyush Pant
Xiao Zhang
AAML
118
0
0
10 Oct 2025
SyncHuman: Synchronizing 2D and 3D Generative Models for Single-view Human Reconstruction
SyncHuman: Synchronizing 2D and 3D Generative Models for Single-view Human Reconstruction
Wenyue Chen
Peng Li
Wangguandong Zheng
Chengfeng Zhao
Mengfei Li
Yaolong Zhu
Zhiyang Dou
Ronggang Wang
Yuan Liu
3DH3DGS
270
0
0
09 Oct 2025
ArenaBencher: Automatic Benchmark Evolution via Multi-Model Competitive Evaluation
ArenaBencher: Automatic Benchmark Evolution via Multi-Model Competitive Evaluation
Qin Liu
Jacob Dineen
Y. Huang
Sheng Zhang
Hoifung Poon
Ben Zhou
Muhao Chen
ELM
134
0
0
09 Oct 2025
Unmasking Backdoors: An Explainable Defense via Gradient-Attention Anomaly Scoring for Pre-trained Language Models
Unmasking Backdoors: An Explainable Defense via Gradient-Attention Anomaly Scoring for Pre-trained Language Models
Anindya Sundar Das
Kangjie Chen
M. Bhuyan
SILMAAML
166
0
0
05 Oct 2025
Think Twice, Generate Once: Safeguarding by Progressive Self-Reflection
Think Twice, Generate Once: Safeguarding by Progressive Self-Reflection
Hoang Phan
Victor Li
Qi Lei
KELMCLL
180
0
0
29 Sep 2025
Active Attacks: Red-teaming LLMs via Adaptive Environments
Active Attacks: Red-teaming LLMs via Adaptive Environments
Taeyoung Yun
P. St-Charles
Jinkyoo Park
Yoshua Bengio
Minsu Kim
AAML
180
0
0
26 Sep 2025
GEP: A GCG-Based method for extracting personally identifiable information from chatbots built on small language models
GEP: A GCG-Based method for extracting personally identifiable information from chatbots built on small language models
Jieli Zhu
Vi Ngoc-Nha Tran
230
0
0
25 Sep 2025
Trigger Where It Hurts: Unveiling Hidden Backdoors through Sensitivity with Sensitron
Trigger Where It Hurts: Unveiling Hidden Backdoors through Sensitivity with Sensitron
Gejian Zhao
Hanzhou Wu
Xinpeng Zhang
185
0
0
23 Sep 2025
Semantic Representation Attack against Aligned Large Language Models
Semantic Representation Attack against Aligned Large Language Models
Jiawei Lian
Jianhong Pan
L. Wang
Yi Wang
Shaohui Mei
Lap-Pui Chau
246
1
0
18 Sep 2025
Thinking in a Crowd: How Auxiliary Information Shapes LLM Reasoning
Thinking in a Crowd: How Auxiliary Information Shapes LLM Reasoning
Haodong Zhao
Chenyan Zhao
Yansi Li
Zhuosheng Zhang
Gongshen Liu
LRM
122
2
0
17 Sep 2025
A Multi-Agent LLM Defense Pipeline Against Prompt Injection Attacks
A Multi-Agent LLM Defense Pipeline Against Prompt Injection Attacks
S M Asif Hossain
Ruksat Khan Shayoni
Mohd Ruhul Ameen
Akif Islam
M. F. Mridha
Jungpil Shin
LLMAGSILMAAML
304
1
0
16 Sep 2025
From Noise to Narrative: Tracing the Origins of Hallucinations in Transformers
From Noise to Narrative: Tracing the Origins of Hallucinations in Transformers
Praneet Suresh
Jack Stanley
Sonia Joseph
Luca Scimeca
Danilo Bzdok
234
1
0
08 Sep 2025
"Abuse Risks are Often Inherent to Product Features": Exploring AI Vendors' Bug Bounty and Responsible Disclosure Policies
"Abuse Risks are Often Inherent to Product Features": Exploring AI Vendors' Bug Bounty and Responsible Disclosure Policies
Yangheran Piao
Jingjie Li
Daniel W. Woods
132
1
0
07 Sep 2025
See No Evil: Adversarial Attacks Against Linguistic-Visual Association in Referring Multi-Object Tracking Systems
See No Evil: Adversarial Attacks Against Linguistic-Visual Association in Referring Multi-Object Tracking Systems
Halima Bouzidi
Haoyu Liu
M. A. Al Faruque
AAML
236
0
0
02 Sep 2025
Adaptive Originality Filtering: Rejection Based Prompting and RiddleScore for Culturally Grounded Multilingual Riddle Generation
Adaptive Originality Filtering: Rejection Based Prompting and RiddleScore for Culturally Grounded Multilingual Riddle Generation
Duy Le
Kent Ziti
Evan Girard-Sun
Bakr Bouhaya
Sean O'Brien
Sean O Brien
Kevin Zhu
217
0
0
26 Aug 2025
Trust Me, I Know This Function: Hijacking LLM Static Analysis using Bias
Trust Me, I Know This Function: Hijacking LLM Static Analysis using Bias
Shir Bernstein
David Beste
Daniel Ayzenshteyn
Lea Schonherr
Yisroel Mirsky
138
0
0
24 Aug 2025
Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent
Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent
Sajib Biswas
Mao Nishino
Samuel Jacob Chacko
Xiuwen Liu
AAML
148
2
0
20 Aug 2025
From Charts to Fair Narratives: Uncovering and Mitigating Geo-Economic Biases in Chart-to-Text
From Charts to Fair Narratives: Uncovering and Mitigating Geo-Economic Biases in Chart-to-Text
Ridwan Mahbub
Mohammed Saidul Islam
Mir Tafseer Nayeem
Md Tahmid Rahman Laskar
Mizanur Rahman
Shafiq Joty
Enamul Hoque
127
0
0
13 Aug 2025
Special-Character Adversarial Attacks on Open-Source Language Model
Special-Character Adversarial Attacks on Open-Source Language Model
Ephraiem Sarabamoun
129
2
0
12 Aug 2025
Streamlining Admission with LOR Insights: AI-Based Leadership Assessment in Online Master's Program
Streamlining Admission with LOR Insights: AI-Based Leadership Assessment in Online Master's Program
Meryem Yilmaz Soylu
Adrian Gallard
Jeonghyun Lee
Gayane Grigoryan
Rushil Desai
Stephen Harmon
154
0
0
07 Aug 2025
A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models
A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models
Jiayi Wen
Tianxin Chen
Zhirun Zheng
Cheng Huang
308
1
0
06 Aug 2025
TRACEALIGN -- Tracing the Drift: Attributing Alignment Failures to Training-Time Belief Sources in LLMs
TRACEALIGN -- Tracing the Drift: Attributing Alignment Failures to Training-Time Belief Sources in LLMs
A. Das
Vinija Jain
Vasu Sharma
LLMSV
133
0
0
04 Aug 2025
Augmented Vision-Language Models: A Systematic Review
Augmented Vision-Language Models: A Systematic Review
Anthony C Davis
Burhan Sadiq
Tianmin Shu
Chien-Ming Huang
VLMLRM
196
0
0
24 Jul 2025
Rainbow Noise: Stress-Testing Multimodal Harmful-Meme Detectors on LGBTQ Content
Rainbow Noise: Stress-Testing Multimodal Harmful-Meme Detectors on LGBTQ Content
Ran Tong
Songtao Wei
Jiaqi Liu
Lanruo Wang
244
7
0
24 Jul 2025
Manipulating LLM Web Agents with Indirect Prompt Injection Attack via HTML Accessibility Tree
Manipulating LLM Web Agents with Indirect Prompt Injection Attack via HTML Accessibility Tree
Sam Johnson
Viet Pham
Thai Le
LLMAG
101
6
0
20 Jul 2025
ROBAD: Robust Adversary-aware Local-Global Attended Bad Actor Detection Sequential Model
ROBAD: Robust Adversary-aware Local-Global Attended Bad Actor Detection Sequential Model
Bing He
M. Ahamad
Srijan Kumar
AAML
110
0
0
20 Jul 2025
Small Edits, Big Consequences: Telling Good from Bad Robustness in Large Language Models
Small Edits, Big Consequences: Telling Good from Bad Robustness in Large Language Models
Altynbek Ismailov
Salia Asanova
KELM
117
0
0
15 Jul 2025
PRM-Free Security Alignment of Large Models via Red Teaming and Adversarial Training
PRM-Free Security Alignment of Large Models via Red Teaming and Adversarial Training
Pengfei Du
AAML
152
2
0
14 Jul 2025
A Mathematical Theory of Discursive Networks
A Mathematical Theory of Discursive Networks
Juan B. Gutiérrez
405
0
0
09 Jul 2025
VERA: Variational Inference Framework for Jailbreaking Large Language Models
VERA: Variational Inference Framework for Jailbreaking Large Language Models
Anamika Lochab
Lu Yan
Patrick Pynadath
Xiangyu Zhang
Ruqi Zhang
AAMLVLM
377
1
0
27 Jun 2025
FORTRESS: Frontier Risk Evaluation for National Security and Public Safety
FORTRESS: Frontier Risk Evaluation for National Security and Public Safety
Christina Q. Knight
Kaustubh Deshpande
Ved Sirdeshmukh
Meher Mankikar
Scale Red Team
SEAL Research Team
Julian Michael
AAMLELM
313
3
0
17 Jun 2025
Transforming Chatbot Text: A Sequence-to-Sequence Approach
Transforming Chatbot Text: A Sequence-to-Sequence Approach
Natesh Reddy
Mark Stamp
DeLMOSILM
182
0
0
15 Jun 2025
Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive-$k$
Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive-kkk
Chihiro Taguchi
Seiji Maekawa
Nikita Bhutani
RALM
264
4
0
10 Jun 2025
From Threat to Tool: Leveraging Refusal-Aware Injection Attacks for Safety Alignment
From Threat to Tool: Leveraging Refusal-Aware Injection Attacks for Safety Alignment
Kyubyung Chae
Hyunbin Jin
Taesup Kim
236
0
0
07 Jun 2025
1234...121314
Next
Page 1 of 14
Pageof 14