ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.07125
  4. Cited By
Universal Adversarial Triggers for Attacking and Analyzing NLP
v1v2v3 (latest)

Universal Adversarial Triggers for Attacking and Analyzing NLP

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
20 August 2019
Eric Wallace
Shi Feng
Nikhil Kandpal
Matt Gardner
Sameer Singh
    AAMLSILM
ArXiv (abs)PDFHTML

Papers citing "Universal Adversarial Triggers for Attacking and Analyzing NLP"

50 / 662 papers shown
Benchmark Transparency: Measuring the Impact of Data on Evaluation
Benchmark Transparency: Measuring the Impact of Data on Evaluation
Venelin Kovatchev
Matthew Lease
181
5
0
31 Mar 2024
$\textit{LinkPrompt}$: Natural and Universal Adversarial Attacks on
  Prompt-based Language Models
LinkPrompt\textit{LinkPrompt}LinkPrompt: Natural and Universal Adversarial Attacks on Prompt-based Language Models
Yue Xu
Wenjie Wang
SILMAAML
266
5
0
25 Mar 2024
Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A
  Pilot Study
Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study
Chenguang Wang
Ruoxi Jia
Xin Liu
Dawn Song
VLM
207
10
0
15 Mar 2024
Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias
  in Factual Knowledge Extraction
Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge ExtractionInternational Conference on Language Resources and Evaluation (LREC), 2024
Ziyang Xu
Keqin Peng
Liang Ding
Dacheng Tao
Xiliang Lu
236
19
0
15 Mar 2024
ACT-MNMT Auto-Constriction Turning for Multilingual Neural Machine
  Translation
ACT-MNMT Auto-Constriction Turning for Multilingual Neural Machine Translation
Shaojie Dai
Xin Liu
Ping Luo
Yue Yu
LRM
213
1
0
11 Mar 2024
Neural Exec: Learning (and Learning from) Execution Triggers for Prompt
  Injection Attacks
Neural Exec: Learning (and Learning from) Execution Triggers for Prompt Injection Attacks
Dario Pasquini
Martin Strohmeier
Carmela Troncoso
AAML
332
60
0
06 Mar 2024
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Nathaniel Li
Alexander Pan
Anjali Gopal
Summer Yue
Daniel Berrios
...
Yan Shoshitaishvili
Jimmy Ba
K. Esvelt
Alexandr Wang
Dan Hendrycks
ELM
758
305
0
05 Mar 2024
Word Importance Explains How Prompts Affect Language Model Outputs
Word Importance Explains How Prompts Affect Language Model Outputs
Stefan Hackmann
Haniyeh Mahmoudian
Mark Steadman
Michael Schmidt
AAML
479
11
0
05 Mar 2024
Curiosity-driven Red-teaming for Large Language Models
Curiosity-driven Red-teaming for Large Language Models
Zhang-Wei Hong
Idan Shenfeld
Tsun-Hsuan Wang
Yung-Sung Chuang
Aldo Pareja
James R. Glass
Akash Srivastava
Pulkit Agrawal
LRM
260
77
0
29 Feb 2024
Pointing out the Shortcomings of Relation Extraction Models with
  Semantically Motivated Adversarials
Pointing out the Shortcomings of Relation Extraction Models with Semantically Motivated Adversarials
Gennaro Nolano
Moritz Blum
Basil Ell
Philipp Cimiano
203
3
0
29 Feb 2024
On the Challenges and Opportunities in Generative AI
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Kushagra Pandey
Robert Bamler
Sina Daubener
...
Yixin Wang
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
762
40
0
28 Feb 2024
Fast Adversarial Attacks on Language Models In One GPU Minute
Fast Adversarial Attacks on Language Models In One GPU Minute
Vinu Sankar Sadasivan
Shoumik Saha
Gaurang Sriramanan
Priyatham Kattakinda
Atoosa Malemir Chegini
Soheil Feizi
MIALM
337
69
0
23 Feb 2024
CEV-LM: Controlled Edit Vector Language Model for Shaping Natural
  Language Generations
CEV-LM: Controlled Edit Vector Language Model for Shaping Natural Language Generations
Samraj Moorjani
A. Krishnan
Hari Sundaram
KELM
188
1
0
22 Feb 2024
Coercing LLMs to do and reveal (almost) anything
Coercing LLMs to do and reveal (almost) anything
Jonas Geiping
Alex Stein
Manli Shu
Khalid Saifullah
Yuxin Wen
Tom Goldstein
AAML
239
82
0
21 Feb 2024
Understanding and Mitigating the Threat of Vec2Text to Dense Retrieval
  Systems
Understanding and Mitigating the Threat of Vec2Text to Dense Retrieval Systems
Shengyao Zhuang
Bevan Koopman
Xiaoran Chu
Guido Zuccon
245
7
0
20 Feb 2024
Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions
  Without the Question?
Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?
Nishant Balepur
Abhilasha Ravichander
Rachel Rudinger
ELM
333
60
0
19 Feb 2024
FIPO: Free-form Instruction-oriented Prompt Optimization with Preference
  Dataset and Modular Fine-tuning Schema
FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema
Junru Lu
Siyu An
Min Zhang
Yulan He
Di Yin
Xing Sun
296
5
0
19 Feb 2024
Benchmarking Knowledge Boundary for Large Language Models: A Different
  Perspective on Model Evaluation
Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation
Xunjian Yin
Xu Zhang
Jie Ruan
Xiaojun Wan
ELM
366
36
0
18 Feb 2024
TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks
TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks
Ben Feuer
R. Schirrmeister
Valeriia Cherepanova
Chinmay Hegde
Katharina Eggensperger
Micah Goldblum
Niv Cohen
Colin White
290
30
0
17 Feb 2024
Representation Surgery: Theory and Practice of Affine Steering
Representation Surgery: Theory and Practice of Affine Steering
Shashwat Singh
Shauli Ravfogel
Jonathan Herzig
Roee Aharoni
Robert Bamler
Ponnurangam Kumaraguru
LLMSV
494
30
0
15 Feb 2024
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
Zhichen Dong
Zhanhui Zhou
Chao Yang
Jing Shao
Yu Qiao
ELM
313
130
0
14 Feb 2024
Attacking Large Language Models with Projected Gradient Descent
Attacking Large Language Models with Projected Gradient Descent
Simon Geisler
Tom Wollschlager
M. H. I. Abdalla
Johannes Gasteiger
Stephan Günnemann
AAMLSILM
319
97
0
14 Feb 2024
COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
Xing-ming Guo
Fangxu Yu
Huan Zhang
Lianhui Qin
Bin Hu
AAML
435
148
0
13 Feb 2024
Test-Time Backdoor Attacks on Multimodal Large Language Models
Test-Time Backdoor Attacks on Multimodal Large Language Models
Dong Lu
Tianyu Pang
Chao Du
Qian Liu
Xianjun Yang
Min Lin
AAML
386
37
0
13 Feb 2024
Discovering Universal Semantic Triggers for Text-to-Image Synthesis
Discovering Universal Semantic Triggers for Text-to-Image Synthesis
Shengfang Zhai
Weilong Wang
Jiajun Li
Yinpeng Dong
Hang Su
Qingni Shen
EGVM
150
4
0
12 Feb 2024
Prompt Perturbation in Retrieval-Augmented Generation based Large
  Language Models
Prompt Perturbation in Retrieval-Augmented Generation based Large Language ModelsKnowledge Discovery and Data Mining (KDD), 2024
Zhibo Hu
Chen Wang
Yanfeng Shu
Helen Paik
Paik
Liming Zhu
SILMRALM
217
27
0
11 Feb 2024
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming
  and Robust Refusal
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Mantas Mazeika
Long Phan
Xuwang Yin
Andy Zou
Zifan Wang
...
Nathaniel Li
Steven Basart
Bo Li
David A. Forsyth
Dan Hendrycks
AAML
360
741
0
06 Feb 2024
Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large
  Language Models
Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Spyridon Mouselinos
Henryk Michalewski
Mateusz Malinowski
LRM
187
11
0
06 Feb 2024
PAP-REC: Personalized Automatic Prompt for Recommendation Language Model
PAP-REC: Personalized Automatic Prompt for Recommendation Language Model
Zelong Li
Jianchao Ji
Yingqiang Ge
Qingfeng Lan
Zelong Li
208
7
0
01 Feb 2024
Navigating the OverKill in Large Language Models
Navigating the OverKill in Large Language Models
Chenyu Shi
Xiao Wang
Qiming Ge
Songyang Gao
Xianjun Yang
Tao Gui
Tao Gui
Xuanjing Huang
Xun Zhao
Dahua Lin
219
26
0
31 Jan 2024
Robust Prompt Optimization for Defending Language Models Against
  Jailbreaking Attacks
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
Andy Zhou
Bo Li
Haohan Wang
AAML
428
133
0
30 Jan 2024
Gradient-Based Language Model Red Teaming
Gradient-Based Language Model Red Teaming
Nevan Wichers
Carson E. Denison
Ahmad Beirami
249
41
0
30 Jan 2024
Single Word Change is All You Need: Using LLMs to Create Synthetic Training Examples for Text Classifiers
Single Word Change is All You Need: Using LLMs to Create Synthetic Training Examples for Text Classifiers
Lei Xu
Sarah Alnegheimish
Laure Berti-Equille
Alfredo Cuesta-Infante
K. Veeramachaneni
AAML
270
0
0
30 Jan 2024
Tradeoffs Between Alignment and Helpfulness in Language Models with Steering Methods
Tradeoffs Between Alignment and Helpfulness in Language Models with Steering Methods
Yotam Wolf
Noam Wies
Dorin Shteyman
Binyamin Rothberg
Yoav Levine
Amnon Shashua
LLMSV
685
18
0
29 Jan 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Black-Box Access is Insufficient for Rigorous AI AuditsConference on Fairness, Accountability and Transparency (FAccT), 2024
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
560
133
0
25 Jan 2024
Text Embedding Inversion Security for Multilingual Language Models
Text Embedding Inversion Security for Multilingual Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Yiyi Chen
Heather Lent
Johannes Bjerva
444
24
0
22 Jan 2024
Finding a Needle in the Adversarial Haystack: A Targeted Paraphrasing
  Approach For Uncovering Edge Cases with Minimal Distribution Distortion
Finding a Needle in the Adversarial Haystack: A Targeted Paraphrasing Approach For Uncovering Edge Cases with Minimal Distribution DistortionConference of the European Chapter of the Association for Computational Linguistics (EACL), 2024
Aly M. Kassem
Sherif Saad
AAML
301
3
0
21 Jan 2024
PRewrite: Prompt Rewriting with Reinforcement Learning
PRewrite: Prompt Rewriting with Reinforcement LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Weize Kong
Spurthi Amba Hombaiah
Mingyang Zhang
Qiaozhu Mei
Michael Bendersky
LLMAG
237
38
0
16 Jan 2024
Generative AI in EU Law: Liability, Privacy, Intellectual Property, and
  Cybersecurity
Generative AI in EU Law: Liability, Privacy, Intellectual Property, and CybersecuritySocial Science Research Network (SSRN), 2024
Claudio Novelli
F. Casolari
Philipp Hacker
Giorgio Spedicato
Luciano Floridi
AILawSILM
444
99
0
14 Jan 2024
Parameter-Efficient Detoxification with Contrastive Decoding
Parameter-Efficient Detoxification with Contrastive Decoding
Tong Niu
Caiming Xiong
Semih Yavuz
Yingbo Zhou
164
17
0
13 Jan 2024
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO
  and Toxicity
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and ToxicityInternational Conference on Machine Learning (ICML), 2024
Andrew Lee
Xiaoyan Bai
Itamar Pres
Martin Wattenberg
Jonathan K. Kummerfeld
Amélie Reymond
324
159
0
03 Jan 2024
SA$^2$VP: Spatially Aligned-and-Adapted Visual Prompt
SA2^22VP: Spatially Aligned-and-Adapted Visual PromptAAAI Conference on Artificial Intelligence (AAAI), 2023
Wenjie Pei
Tongqi Xia
Fanglin Chen
Jinsong Li
Jiandong Tian
Guangming Lu
VLMVPVLM
181
25
0
16 Dec 2023
SMILE: Multimodal Dataset for Understanding Laughter in Video with
  Language Models
SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models
Lee Hyun
Kim Sung-Bin
Seungju Han
Youngjae Yu
Tae-Hyun Oh
414
21
0
15 Dec 2023
Taxonomy-based CheckList for Large Language Model Evaluation
Taxonomy-based CheckList for Large Language Model Evaluation
Damin Zhang
149
0
0
15 Dec 2023
Silent Guardian: Protecting Text from Malicious Exploitation by Large
  Language Models
Silent Guardian: Protecting Text from Malicious Exploitation by Large Language ModelsIEEE Transactions on Information Forensics and Security (IEEE TIFS), 2023
Jiawei Zhao
Kejiang Chen
Xianjian Yuan
Yuang Qi
Weiming Zhang
Neng H. Yu
261
14
0
15 Dec 2023
Dissecting vocabulary biases datasets through statistical testing and
  automated data augmentation for artifact mitigation in Natural Language
  Inference
Dissecting vocabulary biases datasets through statistical testing and automated data augmentation for artifact mitigation in Natural Language Inference
Dat Thanh Nguyen
106
0
0
14 Dec 2023
Accelerating the Global Aggregation of Local Explanations
Accelerating the Global Aggregation of Local ExplanationsAAAI Conference on Artificial Intelligence (AAAI), 2023
Alon Mor
Yonatan Belinkov
B. Kimelfeld
FAtt
219
6
0
13 Dec 2023
ToViLaG: Your Visual-Language Generative Model is Also An Evildoer
ToViLaG: Your Visual-Language Generative Model is Also An EvildoerConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Xinpeng Wang
Xiaoyuan Yi
Han Jiang
Shanlin Zhou
Zhihua Wei
Xing Xie
251
25
0
13 Dec 2023
Tell, don't show: Declarative facts influence how LLMs generalize
Tell, don't show: Declarative facts influence how LLMs generalize
Alexander Meinke
Owain Evans
224
9
0
12 Dec 2023
LLF-Bench: Benchmark for Interactive Learning from Language Feedback
LLF-Bench: Benchmark for Interactive Learning from Language Feedback
Ching-An Cheng
Andrey Kolobov
Dipendra Kumar Misra
Allen Nie
Adith Swaminathan
266
24
0
11 Dec 2023
Previous
123456...121314
Next
Page 5 of 14
Pageof 14