v1v2v3 (latest)

Universal Adversarial Triggers for Attacking and Analyzing NLP

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019

20 August 2019

Papers citing "Universal Adversarial Triggers for Attacking and Analyzing NLP"

50 / 662 papers shown

Benchmark Transparency: Measuring the Impact of Data on Evaluation

Venelin Kovatchev

Matthew Lease

181

31 Mar 2024

$$\textit{LinkPrompt}$: Natural and Universal Adversarial Attacks on Prompt-based Language Models$

\textit{LinkPrompt}

: Natural and Universal Adversarial Attacks on Prompt-based Language Models

Yue Xu

Wenjie Wang

SILM AAML

266

25 Mar 2024

Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study

207

15 Mar 2024

Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge ExtractionInternational Conference on Language Resources and Evaluation (LREC), 2024

Liang Ding

236

15 Mar 2024

ACT-MNMT Auto-Constriction Turning for Multilingual Neural Machine Translation

213

11 Mar 2024

Neural Exec: Learning (and Learning from) Execution Triggers for Prompt Injection Attacks

332

06 Mar 2024

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

...

758

305

05 Mar 2024

Word Importance Explains How Prompts Affect Language Model Outputs

479

05 Mar 2024

Curiosity-driven Red-teaming for Large Language Models

Akash Srivastava

260

29 Feb 2024

Pointing out the Shortcomings of Relation Extraction Models with Semantically Motivated Adversarials

203

29 Feb 2024

On the Challenges and Opportunities in Generative AI

...

762

28 Feb 2024

Fast Adversarial Attacks on Language Models In One GPU Minute

Vinu Sankar Sadasivan

Shoumik Saha

Gaurang Sriramanan

Priyatham Kattakinda

Atoosa Malemir Chegini

Soheil Feizi

MIALM

337

23 Feb 2024

CEV-LM: Controlled Edit Vector Language Model for Shaping Natural Language Generations

188

22 Feb 2024

Coercing LLMs to do and reveal (almost) anything

239

21 Feb 2024

Understanding and Mitigating the Threat of Vec2Text to Dense Retrieval Systems

Shengyao Zhuang

Bevan Koopman

Xiaoran Chu

Guido Zuccon

245

20 Feb 2024

Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?

Nishant Balepur

Abhilasha Ravichander

Rachel Rudinger

ELM

333

19 Feb 2024

FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema

Junru Lu

Siyu An

Min Zhang

Yulan He

Di Yin

Xing Sun

296

19 Feb 2024

Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation

Xiaojun Wan

366

18 Feb 2024

TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks

Katharina Eggensperger

Micah Goldblum

Niv Cohen

Colin White

290

17 Feb 2024

Representation Surgery: Theory and Practice of Affine Steering

Ponnurangam Kumaraguru

LLMSV

494

15 Feb 2024

Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey

Yu Qiao

313

130

14 Feb 2024

Attacking Large Language Models with Projected Gradient Descent

Stephan Günnemann

319

14 Feb 2024

COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability

435

148

13 Feb 2024

Test-Time Backdoor Attacks on Multimodal Large Language Models

386

13 Feb 2024

Discovering Universal Semantic Triggers for Text-to-Image Synthesis

Shengfang Zhai

Weilong Wang

Jiajun Li

Yinpeng Dong

Hang Su

Qingni Shen

EGVM

150

12 Feb 2024

Prompt Perturbation in Retrieval-Augmented Generation based Large Language ModelsKnowledge Discovery and Data Mining (KDD), 2024

Liming Zhu

217

11 Feb 2024

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

...

360

741

06 Feb 2024

Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Spyridon Mouselinos

Henryk Michalewski

Mateusz Malinowski

LRM

187

06 Feb 2024

PAP-REC: Personalized Automatic Prompt for Recommendation Language Model

208

01 Feb 2024

Navigating the OverKill in Large Language Models

Xuanjing Huang

Dahua Lin

219

31 Jan 2024

Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks

Andy Zhou

Bo Li

Haohan Wang

AAML

428

133

30 Jan 2024

Gradient-Based Language Model Red Teaming

Nevan Wichers

Carson E. Denison

Ahmad Beirami

249

30 Jan 2024

Single Word Change is All You Need: Using LLMs to Create Synthetic Training Examples for Text Classifiers

Lei Xu

Sarah Alnegheimish

Laure Berti-Equille

Alfredo Cuesta-Infante

K. Veeramachaneni

AAML

270

30 Jan 2024

Tradeoffs Between Alignment and Helpfulness in Language Models with Steering Methods

685

29 Jan 2024

Black-Box Access is Insufficient for Rigorous AI AuditsConference on Fairness, Accountability and Transparency (FAccT), 2024

...

Dylan Hadfield-Menell

AAML

560

133

25 Jan 2024

Text Embedding Inversion Security for Multilingual Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Yiyi Chen

Heather Lent

Johannes Bjerva

444

22 Jan 2024

Finding a Needle in the Adversarial Haystack: A Targeted Paraphrasing Approach For Uncovering Edge Cases with Minimal Distribution DistortionConference of the European Chapter of the Association for Computational Linguistics (EACL), 2024

Aly M. Kassem

Sherif Saad

AAML

301

21 Jan 2024

PRewrite: Prompt Rewriting with Reinforcement LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Weize Kong

Spurthi Amba Hombaiah

237

16 Jan 2024

Generative AI in EU Law: Liability, Privacy, Intellectual Property, and CybersecuritySocial Science Research Network (SSRN), 2024

444

14 Jan 2024

Parameter-Efficient Detoxification with Contrastive Decoding

Tong Niu

Caiming Xiong

Semih Yavuz

Yingbo Zhou

164

13 Jan 2024

A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and ToxicityInternational Conference on Machine Learning (ICML), 2024

Jonathan K. Kummerfeld

Amélie Reymond

324

159

03 Jan 2024

SA$^2$VP: Spatially Aligned-and-Adapted Visual Prompt

^2

VP: Spatially Aligned-and-Adapted Visual PromptAAAI Conference on Artificial Intelligence (AAAI), 2023

181

16 Dec 2023

SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models

414

15 Dec 2023

Taxonomy-based CheckList for Large Language Model Evaluation

Damin Zhang

149

15 Dec 2023

Silent Guardian: Protecting Text from Malicious Exploitation by Large Language ModelsIEEE Transactions on Information Forensics and Security (IEEE TIFS), 2023

261

15 Dec 2023

Dissecting vocabulary biases datasets through statistical testing and automated data augmentation for artifact mitigation in Natural Language Inference

Dat Thanh Nguyen

106

14 Dec 2023

Accelerating the Global Aggregation of Local ExplanationsAAAI Conference on Artificial Intelligence (AAAI), 2023

219

13 Dec 2023

ToViLaG: Your Visual-Language Generative Model is Also An EvildoerConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Xing Xie

251

13 Dec 2023

Tell, don't show: Declarative facts influence how LLMs generalize

Alexander Meinke

Owain Evans

224

12 Dec 2023

LLF-Bench: Benchmark for Interactive Learning from Language Feedback

266

11 Dec 2023