v1v2v3 (latest)

Universal Adversarial Triggers for Attacking and Analyzing NLP

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019

20 August 2019

Papers citing "Universal Adversarial Triggers for Attacking and Analyzing NLP"

50 / 662 papers shown

Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models

377

29 Nov 2025

TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization

297

23 Nov 2025

PARROT: Persuasion and Agreement Robustness Rating of Output Truth -- A Sycophancy Robustness Benchmark for LLMs

279

21 Nov 2025

SteganoBackdoor: Stealthy and Data-Efficient Backdoor Attacks on Language Models

162

18 Nov 2025

Training Language Models to Explain Their Own Computations

236

11 Nov 2025

AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models

617

04 Nov 2025

"Give a Positive Review Only": An Early Investigation Into In-Paper Prompt Injection Attacks and Defenses for AI Reviewers

134

03 Nov 2025

NeuroGenPoisoning: Neuron-Guided Attacks on Retrieval-Augmented Generation of LLM via Genetic Optimization of External Knowledge

281

24 Oct 2025

Toward Understanding the Transferability of Adversarial Suffixes in Large Language Models

298

24 Oct 2025

Can Transformer Memory Be Corrupted? Investigating Cache-Side Vulnerabilities in Large Language Models

184

20 Oct 2025

Online Learning Defense against Iterative Jailbreak Attacks via Prompt Optimization

147

19 Oct 2025

SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models

235

17 Oct 2025

Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers

248

16 Oct 2025

Selective Adversarial Attacks on LLM Benchmarks

122

15 Oct 2025

In-Browser LLM-Guided Fuzzing for Real-Time Prompt Injection Testing in Agentic AI Browsers

Avihay Cohen

SILM LLMAG AI4CE

209

15 Oct 2025

CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biases in LLMs

Nafiseh Nikeghbal

Amir Hossein Kargaran

Jana Diesner

151

10 Oct 2025

GREAT: Generalizable Backdoor Attacks in RLHF via Emotion-Aware Trigger Synthesis

119

10 Oct 2025

SyncHuman: Synchronizing 2D and 3D Generative Models for Single-view Human Reconstruction

271

09 Oct 2025

ArenaBencher: Automatic Benchmark Evolution via Multi-Model Competitive Evaluation

147

09 Oct 2025

Unmasking Backdoors: An Explainable Defense via Gradient-Attention Anomaly Scoring for Pre-trained Language Models

166

05 Oct 2025

Think Twice, Generate Once: Safeguarding by Progressive Self-Reflection

181

29 Sep 2025

Active Attacks: Red-teaming LLMs via Adaptive Environments

180

26 Sep 2025

GEP: A GCG-Based method for extracting personally identifiable information from chatbots built on small language models

Jieli Zhu

Vi Ngoc-Nha Tran

230

25 Sep 2025

Trigger Where It Hurts: Unveiling Hidden Backdoors through Sensitivity with Sensitron

Gejian Zhao

Hanzhou Wu

Xinpeng Zhang

193

23 Sep 2025

Semantic Representation Attack against Aligned Large Language Models

249

18 Sep 2025

Thinking in a Crowd: How Auxiliary Information Shapes LLM Reasoning

122

17 Sep 2025

A Multi-Agent LLM Defense Pipeline Against Prompt Injection Attacks

307

16 Sep 2025

From Noise to Narrative: Tracing the Origins of Hallucinations in Transformers

235

08 Sep 2025

"Abuse Risks are Often Inherent to Product Features": Exploring AI Vendors' Bug Bounty and Responsible Disclosure Policies

Yangheran Piao

Jingjie Li

Daniel W. Woods

134

07 Sep 2025

See No Evil: Adversarial Attacks Against Linguistic-Visual Association in Referring Multi-Object Tracking Systems

259

02 Sep 2025

Adaptive Originality Filtering: Rejection Based Prompting and RiddleScore for Culturally Grounded Multilingual Riddle Generation

217

26 Aug 2025

Trust Me, I Know This Function: Hijacking LLM Static Analysis using Bias

147

24 Aug 2025

Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent

148

20 Aug 2025

From Charts to Fair Narratives: Uncovering and Mitigating Geo-Economic Biases in Chart-to-Text

Ridwan Mahbub

Mohammed Saidul Islam

Mir Tafseer Nayeem

Md Tahmid Rahman Laskar

Mizanur Rahman

Shafiq Joty

Enamul Hoque

127

13 Aug 2025

Special-Character Adversarial Attacks on Open-Source Language Model

Ephraiem Sarabamoun

151

12 Aug 2025

Streamlining Admission with LOR Insights: AI-Based Leadership Assessment in Online Master's Program

159

07 Aug 2025

A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models

309

06 Aug 2025

TRACEALIGN -- Tracing the Drift: Attributing Alignment Failures to Training-Time Belief Sources in LLMs

133

04 Aug 2025

Augmented Vision-Language Models: A Systematic Review

197

24 Jul 2025

Rainbow Noise: Stress-Testing Multimodal Harmful-Meme Detectors on LGBTQ Content

244

24 Jul 2025

Manipulating LLM Web Agents with Indirect Prompt Injection Attack via HTML Accessibility Tree

102

20 Jul 2025

ROBAD: Robust Adversary-aware Local-Global Attended Bad Actor Detection Sequential Model

111

20 Jul 2025

Small Edits, Big Consequences: Telling Good from Bad Robustness in Large Language Models

Altynbek Ismailov

Salia Asanova

KELM

120

15 Jul 2025

PRM-Free Security Alignment of Large Models via Red Teaming and Adversarial Training

Pengfei Du

AAML

152

14 Jul 2025

A Mathematical Theory of Discursive Networks

Juan B. Gutiérrez

405

09 Jul 2025

VERA: Variational Inference Framework for Jailbreaking Large Language Models

377

27 Jun 2025

FORTRESS: Frontier Risk Evaluation for National Security and Public Safety

313

17 Jun 2025

Transforming Chatbot Text: A Sequence-to-Sequence Approach

Natesh Reddy

Mark Stamp

DeLMO SILM

182

15 Jun 2025

Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive-

k

267

10 Jun 2025

From Threat to Tool: Leveraging Refusal-Aware Injection Attacks for Safety Alignment

Kyubyung Chae

Hyunbin Jin

Taesup Kim

236

07 Jun 2025