v1v2v3 (latest)

Universal Adversarial Triggers for Attacking and Analyzing NLP

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019

20 August 2019

Papers citing "Universal Adversarial Triggers for Attacking and Analyzing NLP"

50 / 662 papers shown

SUCEA: Reasoning-Intensive Retrieval for Adversarial Fact-checking through Claim Decomposition and Editing

275

05 Jun 2025

Normative Conflicts and Shallow AI AlignmentPhilosophical Studies (Philos. Stud.), 2025

Raphaël Millière

251

05 Jun 2025

Shaking to Reveal: Perturbation-Based Detection of LLM Hallucinations

226

03 Jun 2025

SafeGenes: Evaluating the Adversarial Robustness of Genomic Foundation Models

132

01 Jun 2025

A Red Teaming Roadmap Towards System-Level Safety

305

30 May 2025

Learning Safety Constraints for Large Language Models

Xin Chen

Yarden As

Andreas Krause

178

30 May 2025

Towards Secure MLOps: Surveying Attacks, Mitigation Strategies, and Research Challenges

Raj Patel

Himanshu Tripathi

Jasper Stone

Noorbakhsh Amiri Golilarz

191

30 May 2025

TRAPDOC: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents

259

30 May 2025

Lifelong Safety Alignment for Language Models

292

26 May 2025

GhostPrompt: Jailbreaking Text-to-image Generative Models based on Dynamic Optimization

203

25 May 2025

Security Concerns for Large Language Models: A Survey

Miles Q. Li

Benjamin C. M. Fung

PILM ELM

796

24 May 2025

Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards

263

22 May 2025

MixAT: Combining Continuous and Discrete Adversarial Training for LLMs

303

22 May 2025

SPECTRE: Conditional System Prompt Poisoning to Hijack LLMs

Viet Pham

Thai Le

SILM

180

22 May 2025

Adversarially Pretrained Transformers May Be Universally Robust In-Context Learners

549

20 May 2025

Chain-of-Thought Driven Adversarial Scenario Extrapolation for Robust Language Models

411

20 May 2025

Investigating the Vulnerability of LLM-as-a-Judge Architectures to Prompt-Injection Attacks

240

19 May 2025

Distribution Prompting: Understanding the Expressivity of Language Models Through the Next-Token Distributions They Can Produce

Haojin Wang

Zining Zhu

Freda Shi

279

18 May 2025

SPIRIT: Patching Speech Language Models against Jailbreak Attacks

297

18 May 2025

Characterizing the Robustness of Black-Box LLM Planners Under Perturbed Observations with Adaptive Stress Testing

Neeloy Chakraborty

John Pohovey

Melkior Ornik

Katherine Driggs-Campbell

356

08 May 2025

Adversarial Attacks in Multimodal Systems: A Practitioner's SurveyAnnual International Computer Software and Applications Conference (COMPSAC), 2025

Shashank Kapoor

Sanjay Surendranath Girija

512

06 May 2025

Semantic Probabilistic Control of Language Models

306

04 May 2025

Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs

1.1K

03 May 2025

LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures

Francisco Aguilera-Martínez

Fernando Berzal

PILM

391

02 May 2025

Attack and defense techniques in large language models: A survey and new perspectives

285

02 May 2025

OET: Optimization-based prompt injection Evaluation Toolkit

341

01 May 2025

Diff-Prompt: Diffusion-Driven Prompt Generator with Mask SupervisionInternational Conference on Learning Representations (ICLR), 2025

661

30 Apr 2025

Graph of Attacks: Improved Black-Box and Interpretable Jailbreaks for LLMs

Mohammad Akbar-Tajari

Mohammad Taher Pilehvar

Mohammad Mahmoody

AAML

207

26 Apr 2025

GraphAttack: Exploiting Representational Blindspots in LLM Safety Mechanisms

Sinan He

An Wang

165

17 Apr 2025

QAVA: Query-Agnostic Visual Attack to Large Vision-Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

242

15 Apr 2025

NLP Security and Ethics, in the WildTransactions of the Association for Computational Linguistics (TACL), 2025

413

09 Apr 2025

Revealing the Intrinsic Ethical Vulnerability of Aligned Large Language Models

679

07 Apr 2025

On the Robustness of GUI Grounding Models Against Image Attacks

289

07 Apr 2025

Encrypted Prompt: Securing LLM Applications Against Unauthorized Actions

Shih-Han Chan

AAML

209

29 Mar 2025

MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks

367

24 Mar 2025

In-House Evaluation Is Not Enough: Towards Robust Third-Party Flaw Disclosure for General-Purpose AI

...

391

21 Mar 2025

Prompt Injection Detection and Mitigation via AI Multi-Agent NLP Frameworks

219

14 Mar 2025

CtrlRAG: Black-box Adversarial Attacks Based on Masked Language Models in Retrieval-Augmented Language Generation

Runqi Sui

AAML

210

10 Mar 2025

Life-Cycle Routing Vulnerabilities of LLM Router

255

09 Mar 2025

Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation

226

09 Mar 2025

Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models

Meghana Arakkal Rajeev

Sathwik Tejaswi Madhusudan

James Zou

Nazneen Rajani

AAML LRM

321

03 Mar 2025

Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models

997

03 Mar 2025

UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning

356

28 Feb 2025

PolyPrompt: Automating Knowledge Extraction from Multilingual Language Models with Dynamic Prompt Generation

Nathan Roll

303

27 Feb 2025

Shh, don't say that! Domain Certification in LLMsInternational Conference on Learning Representations (ICLR), 2025

348

26 Feb 2025

REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective

277

24 Feb 2025

Attention Eclipse: Manipulating Attention to Bypass LLM Safety-Alignment

258

24 Feb 2025

Rethinking the Vulnerability of Concept Erasure and a New Method

698

24 Feb 2025

Interrogating LLM design under a fair learning doctrine

Johnny Tian-Zheng Wei

299

22 Feb 2025

Eliminating Backdoors in Neural Code Models for Secure Code Understanding

343

21 Feb 2025