Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1908.07125
Cited By
v1
v2
v3 (latest)
Universal Adversarial Triggers for Attacking and Analyzing NLP
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
20 August 2019
Eric Wallace
Shi Feng
Nikhil Kandpal
Matt Gardner
Sameer Singh
AAML
SILM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Universal Adversarial Triggers for Attacking and Analyzing NLP"
50 / 662 papers shown
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification
Boyang Zhang
Yicong Tan
Yun Shen
Ahmed Salem
Michael Backes
Savvas Zannettou
Yang Zhang
LLMAG
AAML
280
55
0
30 Jul 2024
Scaling Trends in Language Model Robustness
Nikolhaus Howe
Michal Zajac
I. R. McKenzie
Oskar Hollinsworth
Tom Tseng
Aaron David Tucker
Pierre-Luc Bacon
Adam Gleave
647
1
0
25 Jul 2024
Multimodal Unlearnable Examples: Protecting Data against Multimodal Contrastive Learning
Xinwei Liu
Yang Liu
Yuan Xun
Yaning Tan
Simeng Qin
284
13
0
23 Jul 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
438
42
0
20 Jul 2024
Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context
Nilanjana Das
Edward Raff
Manas Gaur
AAML
327
7
0
19 Jul 2024
Robust Neural Information Retrieval: An Adversarial and Out-of-distribution Perspective
Yu-An Liu
Ruqing Zhang
Jiafeng Guo
Maarten de Rijke
Yixing Fan
Xueqi Cheng
400
20
0
09 Jul 2024
Raply: A profanity-mitigated rap generator
Omar Manil Bendali
Samir Ferroum
Ekaterina Kozachenko
Youssef Parviz
Hanna Shcharbakova
Anna Tokareva
Shemair Williams
121
0
0
09 Jul 2024
AI Safety in Generative AI Large Language Models: A Survey
Jaymari Chua
Yun Yvonna Li
Shiyi Yang
Chen Wang
Lina Yao
LM&MA
364
37
0
06 Jul 2024
On the Low-Rank Parametrization of Reward Models for Controlled Language Generation
S. Troshin
Vlad Niculae
Antske Fokkens
190
0
0
05 Jul 2024
Defense Against Syntactic Textual Backdoor Attacks with Token Substitution
Xinglin Li
Xianwen He
Yao Li
Minhao Cheng
200
1
0
04 Jul 2024
Securing Multi-turn Conversational Language Models Against Distributed Backdoor Triggers
Terry Tong
Lyne Tchapmi
Qin Liu
Muhao Chen
AAML
SILM
283
6
0
04 Jul 2024
Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models
Xavier Suau
Pieter Delobelle
Katherine Metcalf
Armand Joulin
N. Apostoloff
Luca Zappella
P. Rodríguez
MU
AAML
273
27
0
02 Jul 2024
Jailbreaking LLMs with Arabic Transliteration and Arabizi
Mansour Al Ghanim
Saleh Almohaimeed
Mengxin Zheng
Yan Solihin
Qian Lou
184
7
0
26 Jun 2024
The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm
Aakanksha
Arash Ahmadian
Beyza Ermis
Seraphina Goldfarb-Tarrant
Julia Kreutzer
Marzieh Fadaee
Sara Hooker
365
53
0
26 Jun 2024
JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models
Haibo Jin
Leyang Hu
Xinuo Li
Peiyan Zhang
Chonghan Chen
Jun Zhuang
Haohan Wang
PILM
423
61
0
26 Jun 2024
FrenchToxicityPrompts: a Large Benchmark for Evaluating and Mitigating Toxicity in French Texts
Caroline Brun
Vassilina Nikoulina
292
5
0
25 Jun 2024
Data Augmentation of Multi-turn Psychological Dialogue via Knowledge-driven Progressive Thought Prompting
Jiyue Jiang
Liheng Chen
Sheng Wang
Lingpeng Kong
Yu Li
Chuan Wu
244
0
0
24 Jun 2024
Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
Anton Xue
Avishree Khare
Rajeev Alur
Surbhi Goel
Eric Wong
689
4
0
21 Jun 2024
Adversaries Can Misuse Combinations of Safe Models
Erik Jones
Anca Dragan
Jacob Steinhardt
256
18
0
20 Jun 2024
Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data
Nahema Marchal
Rachel Xu
Rasmi Elasmar
Iason Gabriel
Beth Goldberg
William S. Isaac
LLMAG
239
35
0
19 Jun 2024
Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target Tokens
Xikang Yang
Xuehai Tang
Fuqing Zhu
Jizhong Han
Songlin Hu
VLM
AAML
208
3
0
19 Jun 2024
"Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jailbreak
Lingrui Mei
Shenghua Liu
Yiwei Wang
Baolong Bi
Jiayi Mao
Xueqi Cheng
AAML
209
19
0
17 Jun 2024
Enhancing Question Answering on Charts Through Effective Pre-training Tasks
BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP), 2024
Ashim Gupta
Vivek Gupta
Shuo Zhang
Yujie He
Ning Zhang
Shalin S Shah
142
4
0
14 Jun 2024
Analyzing Multi-Head Attention on Trojan BERT Models
Jingwei Wang
181
0
0
12 Jun 2024
CR-UTP: Certified Robustness against Universal Text Perturbations on Large Language Models
Qian Lou
Xin Liang
Jiaqi Xue
Yancheng Zhang
Rui Xie
Mengxin Zheng
AAML
292
0
0
04 Jun 2024
Tool Learning with Large Language Models: A Survey
Changle Qu
Sunhao Dai
Xiaochi Wei
Hengyi Cai
Shuaiqiang Wang
D. Yin
Jun Xu
Jirong Wen
LLMAG
342
214
0
28 May 2024
White-box Multimodal Jailbreaks Against Large Vision-Language Models
Ruofan Wang
Jiabo He
Hanxu Zhou
Chuanjun Ji
Guangnan Ye
Yu-Gang Jiang
AAML
VLM
252
38
0
28 May 2024
Improved Generation of Adversarial Examples Against Safety-aligned LLMs
Qizhang Li
Yiwen Guo
Wangmeng Zuo
Hao Chen
AAML
SILM
240
12
0
28 May 2024
Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Yiming Chen
Chen Zhang
Danqing Luo
L. F. D’Haro
R. Tan
Haizhou Li
AAML
ELM
225
3
0
23 May 2024
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation
Neural Information Processing Systems (NeurIPS), 2024
Jingnan Zheng
Han Wang
An Zhang
Tai D. Nguyen
Jun Sun
Tat-Seng Chua
LLMAG
357
39
0
23 May 2024
Efficient Universal Goal Hijacking with Semantics-guided Prompt Organization
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Yihao Huang
Chong Wang
Yang Liu
Qing Guo
Felix Juefei Xu
Jian Zhang
G. Pu
Yang Liu
328
9
0
23 May 2024
Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity
Rheeya Uppaal
Apratim De
Yiting He
Yiquao Zhong
Junjie Hu
592
7
0
22 May 2024
DEGAP: Dual Event-Guided Adaptive Prefixes for Templated-Based Event Argument Extraction with Slot Querying
Guanghui Wang
Dexi Liu
Jian-Yun Nie
Qizhi Wan
Rong Hu
Xiping Liu
Wanlong Liu
Jiaming Liu
724
3
0
22 May 2024
A Constraint-Enforcing Reward for Adversarial Attacks on Text Classifiers
Tom Roth
Inigo Jauregi Unanue
A. Abuadbba
Massimo Piccardi
AAML
SILM
226
2
0
20 May 2024
Rethinking ChatGPT's Success: Usability and Cognitive Behaviors Enabled by Auto-regressive LLMs' Prompting
Xinzhe Li
Ming Liu
248
1
0
17 May 2024
Red Teaming Language Models for Contradictory Dialogues
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Xiaofei Wen
Bangzheng Li
Tenghao Huang
Muhao Chen
272
0
0
16 May 2024
PLeak: Prompt Leaking Attacks against Large Language Model Applications
Conference on Computer and Communications Security (CCS), 2024
Bo Hui
Haolin Yuan
Neil Zhenqiang Gong
Philippe Burlina
Yinzhi Cao
AAML
LLMAG
SILM
454
113
0
10 May 2024
Logical Negation Augmenting and Debiasing for Prompt-based Methods
Yitian Li
Jidong Tian
Hao He
Yaohui Jin
197
0
0
08 May 2024
Hire Me or Not? Examining Language Model's Behavior with Occupation Attributes
International Conference on Computational Linguistics (COLING), 2024
Damin Zhang
Yi Zhang
Geetanjali Bihani
Julia Taylor Rayz
483
4
0
06 May 2024
Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs
International Conference on Learning Representations (ICLR), 2024
Feiyang Kang
H. Just
Yifan Sun
Himanshu Jahagirdar
Yuanzhi Zhang
Rongxing Du
Anit Kumar Sahu
Ruoxi Jia
191
31
0
05 May 2024
Assessing Adversarial Robustness of Large Language Models: An Empirical Study
Zeyu Yang
Zhao Meng
Xiaochen Zheng
Roger Wattenhofer
ELM
AAML
167
21
0
04 May 2024
Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs
Valeriia Cherepanova
James Zou
AAML
350
9
0
26 Apr 2024
Trojan Detection in Large Language Models: Insights from The Trojan Detection Challenge
Narek Maloyan
Ekansh Verma
Bulat Nutfullin
Bislan Ashinov
208
17
0
21 Apr 2024
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
Anselm Paulus
Arman Zharmagambetov
Chuan Guo
Brandon Amos
Yuandong Tian
AAML
385
123
0
21 Apr 2024
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Eric Wallace
Kai Y. Xiao
R. Leike
Lilian Weng
Johannes Heidecke
Alex Beutel
SILM
349
235
0
19 Apr 2024
SpamDam: Towards Privacy-Preserving and Adversary-Resistant SMS Spam Detection
Yekai Li
Rufan Zhang
Wenxin Rong
Xianghang Mi
208
7
0
15 Apr 2024
Interactive Prompt Debugging with Sequence Salience
Ian Tenney
Ryan Mullins
Bin Du
Shree Pandya
Minsuk Kahng
Lucas Dixon
LRM
180
6
0
11 Apr 2024
Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data
Tim Baumgärtner
Yang Gao
Dana Alon
Donald Metzler
AAML
228
33
0
08 Apr 2024
Goal-guided Generative Prompt Injection Attack on Large Language Models
Kai Wei
Haoyang Ling
Qinkai Yu
Chengzhi Liu
Haochen Xue
Xiaobo Jin
AAML
SILM
293
27
0
06 Apr 2024
PID Control-Based Self-Healing to Improve the Robustness of Large Language Models
Zhuotong Chen
Zihu Wang
Yifan Yang
Qianxiao Li
Zheng Zhang
AAML
245
3
0
31 Mar 2024
Previous
1
2
3
4
5
...
12
13
14
Next
Page 4 of 14
Page
of 14
Go