Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1908.07125
Cited By
v1
v2
v3 (latest)
Universal Adversarial Triggers for Attacking and Analyzing NLP
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
20 August 2019
Eric Wallace
Shi Feng
Nikhil Kandpal
Matt Gardner
Sameer Singh
AAML
SILM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Universal Adversarial Triggers for Attacking and Analyzing NLP"
50 / 662 papers shown
Benchmark Transparency: Measuring the Impact of Data on Evaluation
Venelin Kovatchev
Matthew Lease
181
5
0
31 Mar 2024
LinkPrompt
\textit{LinkPrompt}
LinkPrompt
: Natural and Universal Adversarial Attacks on Prompt-based Language Models
Yue Xu
Wenjie Wang
SILM
AAML
266
5
0
25 Mar 2024
Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study
Chenguang Wang
Ruoxi Jia
Xin Liu
Dawn Song
VLM
207
10
0
15 Mar 2024
Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction
International Conference on Language Resources and Evaluation (LREC), 2024
Ziyang Xu
Keqin Peng
Liang Ding
Dacheng Tao
Xiliang Lu
236
19
0
15 Mar 2024
ACT-MNMT Auto-Constriction Turning for Multilingual Neural Machine Translation
Shaojie Dai
Xin Liu
Ping Luo
Yue Yu
LRM
213
1
0
11 Mar 2024
Neural Exec: Learning (and Learning from) Execution Triggers for Prompt Injection Attacks
Dario Pasquini
Martin Strohmeier
Carmela Troncoso
AAML
332
60
0
06 Mar 2024
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Nathaniel Li
Alexander Pan
Anjali Gopal
Summer Yue
Daniel Berrios
...
Yan Shoshitaishvili
Jimmy Ba
K. Esvelt
Alexandr Wang
Dan Hendrycks
ELM
758
305
0
05 Mar 2024
Word Importance Explains How Prompts Affect Language Model Outputs
Stefan Hackmann
Haniyeh Mahmoudian
Mark Steadman
Michael Schmidt
AAML
479
11
0
05 Mar 2024
Curiosity-driven Red-teaming for Large Language Models
Zhang-Wei Hong
Idan Shenfeld
Tsun-Hsuan Wang
Yung-Sung Chuang
Aldo Pareja
James R. Glass
Akash Srivastava
Pulkit Agrawal
LRM
260
77
0
29 Feb 2024
Pointing out the Shortcomings of Relation Extraction Models with Semantically Motivated Adversarials
Gennaro Nolano
Moritz Blum
Basil Ell
Philipp Cimiano
203
3
0
29 Feb 2024
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Kushagra Pandey
Robert Bamler
Sina Daubener
...
Yixin Wang
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
762
40
0
28 Feb 2024
Fast Adversarial Attacks on Language Models In One GPU Minute
Vinu Sankar Sadasivan
Shoumik Saha
Gaurang Sriramanan
Priyatham Kattakinda
Atoosa Malemir Chegini
Soheil Feizi
MIALM
337
69
0
23 Feb 2024
CEV-LM: Controlled Edit Vector Language Model for Shaping Natural Language Generations
Samraj Moorjani
A. Krishnan
Hari Sundaram
KELM
188
1
0
22 Feb 2024
Coercing LLMs to do and reveal (almost) anything
Jonas Geiping
Alex Stein
Manli Shu
Khalid Saifullah
Yuxin Wen
Tom Goldstein
AAML
239
82
0
21 Feb 2024
Understanding and Mitigating the Threat of Vec2Text to Dense Retrieval Systems
Shengyao Zhuang
Bevan Koopman
Xiaoran Chu
Guido Zuccon
245
7
0
20 Feb 2024
Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?
Nishant Balepur
Abhilasha Ravichander
Rachel Rudinger
ELM
333
60
0
19 Feb 2024
FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema
Junru Lu
Siyu An
Min Zhang
Yulan He
Di Yin
Xing Sun
296
5
0
19 Feb 2024
Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation
Xunjian Yin
Xu Zhang
Jie Ruan
Xiaojun Wan
ELM
366
36
0
18 Feb 2024
TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks
Ben Feuer
R. Schirrmeister
Valeriia Cherepanova
Chinmay Hegde
Katharina Eggensperger
Micah Goldblum
Niv Cohen
Colin White
290
30
0
17 Feb 2024
Representation Surgery: Theory and Practice of Affine Steering
Shashwat Singh
Shauli Ravfogel
Jonathan Herzig
Roee Aharoni
Robert Bamler
Ponnurangam Kumaraguru
LLMSV
494
30
0
15 Feb 2024
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
Zhichen Dong
Zhanhui Zhou
Chao Yang
Jing Shao
Yu Qiao
ELM
313
130
0
14 Feb 2024
Attacking Large Language Models with Projected Gradient Descent
Simon Geisler
Tom Wollschlager
M. H. I. Abdalla
Johannes Gasteiger
Stephan Günnemann
AAML
SILM
319
97
0
14 Feb 2024
COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
Xing-ming Guo
Fangxu Yu
Huan Zhang
Lianhui Qin
Bin Hu
AAML
435
148
0
13 Feb 2024
Test-Time Backdoor Attacks on Multimodal Large Language Models
Dong Lu
Tianyu Pang
Chao Du
Qian Liu
Xianjun Yang
Min Lin
AAML
386
37
0
13 Feb 2024
Discovering Universal Semantic Triggers for Text-to-Image Synthesis
Shengfang Zhai
Weilong Wang
Jiajun Li
Yinpeng Dong
Hang Su
Qingni Shen
EGVM
150
4
0
12 Feb 2024
Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models
Knowledge Discovery and Data Mining (KDD), 2024
Zhibo Hu
Chen Wang
Yanfeng Shu
Helen Paik
Paik
Liming Zhu
SILM
RALM
217
27
0
11 Feb 2024
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Mantas Mazeika
Long Phan
Xuwang Yin
Andy Zou
Zifan Wang
...
Nathaniel Li
Steven Basart
Bo Li
David A. Forsyth
Dan Hendrycks
AAML
360
741
0
06 Feb 2024
Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Spyridon Mouselinos
Henryk Michalewski
Mateusz Malinowski
LRM
187
11
0
06 Feb 2024
PAP-REC: Personalized Automatic Prompt for Recommendation Language Model
Zelong Li
Jianchao Ji
Yingqiang Ge
Qingfeng Lan
Zelong Li
208
7
0
01 Feb 2024
Navigating the OverKill in Large Language Models
Chenyu Shi
Xiao Wang
Qiming Ge
Songyang Gao
Xianjun Yang
Tao Gui
Tao Gui
Xuanjing Huang
Xun Zhao
Dahua Lin
219
26
0
31 Jan 2024
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
Andy Zhou
Bo Li
Haohan Wang
AAML
428
133
0
30 Jan 2024
Gradient-Based Language Model Red Teaming
Nevan Wichers
Carson E. Denison
Ahmad Beirami
249
41
0
30 Jan 2024
Single Word Change is All You Need: Using LLMs to Create Synthetic Training Examples for Text Classifiers
Lei Xu
Sarah Alnegheimish
Laure Berti-Equille
Alfredo Cuesta-Infante
K. Veeramachaneni
AAML
270
0
0
30 Jan 2024
Tradeoffs Between Alignment and Helpfulness in Language Models with Steering Methods
Yotam Wolf
Noam Wies
Dorin Shteyman
Binyamin Rothberg
Yoav Levine
Amnon Shashua
LLMSV
685
18
0
29 Jan 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Conference on Fairness, Accountability and Transparency (FAccT), 2024
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
560
133
0
25 Jan 2024
Text Embedding Inversion Security for Multilingual Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Yiyi Chen
Heather Lent
Johannes Bjerva
444
24
0
22 Jan 2024
Finding a Needle in the Adversarial Haystack: A Targeted Paraphrasing Approach For Uncovering Edge Cases with Minimal Distribution Distortion
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2024
Aly M. Kassem
Sherif Saad
AAML
301
3
0
21 Jan 2024
PRewrite: Prompt Rewriting with Reinforcement Learning
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Weize Kong
Spurthi Amba Hombaiah
Mingyang Zhang
Qiaozhu Mei
Michael Bendersky
LLMAG
237
38
0
16 Jan 2024
Generative AI in EU Law: Liability, Privacy, Intellectual Property, and Cybersecurity
Social Science Research Network (SSRN), 2024
Claudio Novelli
F. Casolari
Philipp Hacker
Giorgio Spedicato
Luciano Floridi
AILaw
SILM
444
99
0
14 Jan 2024
Parameter-Efficient Detoxification with Contrastive Decoding
Tong Niu
Caiming Xiong
Semih Yavuz
Yingbo Zhou
164
17
0
13 Jan 2024
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
International Conference on Machine Learning (ICML), 2024
Andrew Lee
Xiaoyan Bai
Itamar Pres
Martin Wattenberg
Jonathan K. Kummerfeld
Amélie Reymond
324
159
0
03 Jan 2024
SA
2
^2
2
VP: Spatially Aligned-and-Adapted Visual Prompt
AAAI Conference on Artificial Intelligence (AAAI), 2023
Wenjie Pei
Tongqi Xia
Fanglin Chen
Jinsong Li
Jiandong Tian
Guangming Lu
VLM
VPVLM
181
25
0
16 Dec 2023
SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models
Lee Hyun
Kim Sung-Bin
Seungju Han
Youngjae Yu
Tae-Hyun Oh
414
21
0
15 Dec 2023
Taxonomy-based CheckList for Large Language Model Evaluation
Damin Zhang
149
0
0
15 Dec 2023
Silent Guardian: Protecting Text from Malicious Exploitation by Large Language Models
IEEE Transactions on Information Forensics and Security (IEEE TIFS), 2023
Jiawei Zhao
Kejiang Chen
Xianjian Yuan
Yuang Qi
Weiming Zhang
Neng H. Yu
261
14
0
15 Dec 2023
Dissecting vocabulary biases datasets through statistical testing and automated data augmentation for artifact mitigation in Natural Language Inference
Dat Thanh Nguyen
106
0
0
14 Dec 2023
Accelerating the Global Aggregation of Local Explanations
AAAI Conference on Artificial Intelligence (AAAI), 2023
Alon Mor
Yonatan Belinkov
B. Kimelfeld
FAtt
219
6
0
13 Dec 2023
ToViLaG: Your Visual-Language Generative Model is Also An Evildoer
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Xinpeng Wang
Xiaoyuan Yi
Han Jiang
Shanlin Zhou
Zhihua Wei
Xing Xie
251
25
0
13 Dec 2023
Tell, don't show: Declarative facts influence how LLMs generalize
Alexander Meinke
Owain Evans
224
9
0
12 Dec 2023
LLF-Bench: Benchmark for Interactive Learning from Language Feedback
Ching-An Cheng
Andrey Kolobov
Dipendra Kumar Misra
Allen Nie
Adith Swaminathan
266
24
0
11 Dec 2023
Previous
1
2
3
4
5
6
...
12
13
14
Next
Page 5 of 14
Page
of 14
Go