v1v2v3 (latest)

Universal Adversarial Triggers for Attacking and Analyzing NLP

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019

20 August 2019

Papers citing "Universal Adversarial Triggers for Attacking and Analyzing NLP"

50 / 662 papers shown

SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

539

18 Feb 2025

Universal Adversarial Attack on Aligned Multimodal LLMs

Temurbek Rahmatullaev

493

11 Feb 2025

SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text GenerationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

390

10 Feb 2025

Democratic Training Against Universal Adversarial PerturbationsInternational Conference on Learning Representations (ICLR), 2025

279

08 Feb 2025

"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models

556

02 Feb 2025

A Comprehensive Survey of Foundation Models in MedicineIEEE Reviews in Biomedical Engineering (RBME), 2024

773

17 Jan 2025

CALM: Curiosity-Driven Auditing for Large Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2025

343

06 Jan 2025

LLM-Virus: Evolutionary Jailbreak Attack on Large Language Models

392

03 Jan 2025

Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines

Xiyang Hu

AAML

334

01 Jan 2025

GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search

Matan Ben-Tov

Mahmood Sharif

RALM

536

30 Dec 2024

Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning

183

24 Dec 2024

Robustness of Large Language Models Against Adversarial Attacks

266

22 Dec 2024

Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context

606

20 Dec 2024

Are Language Models Agnostic to Linguistically Grounded Perturbations? A Case Study of Indic LanguagesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Poulami Ghosh

Mary Dabre

Pushpak Bhattacharyya

AAML

308

14 Dec 2024

The Vulnerability of Language Model Benchmarks: Do They Accurately Reflect True LLM Performance?

266

02 Dec 2024

Towards Resource Efficient and Interpretable Bias Mitigation in Large Language Models

226

02 Dec 2024

On the Adversarial Robustness of Instruction-Tuned Large Language Models for Code

Md. Imran Hossen

X. Hei

AAML ELM

327

29 Nov 2024

All-in-one Weather-degraded Image Restoration via Adaptive Degradation-aware Self-prompting ModelIEEE transactions on multimedia (IEEE TMM), 2024

264

12 Nov 2024

Enhancing Financial Fraud Detection with Human-in-the-Loop Feedback and Feedback PropagationInternational Conference on Machine Learning and Applications (ICMLA), 2024

Prashank Kadam

190

07 Nov 2024

Achieving Domain-Independent Certified Robustness via Knowledge ContinuityNeural Information Processing Systems (NeurIPS), 2024

297

03 Nov 2024

Attacking Misinformation Detection Using Adversarial Examples Generated by Language Models

244

28 Oct 2024

Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt OverfittingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

279

25 Oct 2024

Adversarial Attacks on Large Language Models Using Regularized Relaxation

241

24 Oct 2024

Towards Reliable Evaluation of Behavior Steering Interventions in LLMs

196

22 Oct 2024

AdvAgent: Controllable Blackbox Red-teaming on Web Agents

173

22 Oct 2024

SPIN: Self-Supervised Prompt INjection

263

17 Oct 2024

To Err is AI : A Case Study Informing LLM Flaw Reporting PracticesAAAI Conference on Artificial Intelligence (AAAI), 2024

...

200

15 Oct 2024

Deciphering the Chaos: Enhancing Jailbreak Attacks via Adversarial Prompt Translation

358

15 Oct 2024

PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning

467

11 Oct 2024

Cheating Automatic LLM Benchmarks: Null Models Achieve High Win RatesInternational Conference on Learning Representations (ICLR), 2024

Qian Liu

291

09 Oct 2024

Attribute Controlled Fine-tuning for Large Language Models: A Case Study on DetoxificationConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Anil Ramakrishna

Richard Zemel

Kai-Wei Chang

Rahul Gupta

Charith Peris

134

07 Oct 2024

Collaboration! Towards Robust Neural Methods for Routing ProblemsNeural Information Processing Systems (NeurIPS), 2024

Jie Zhang

208

07 Oct 2024

Large Language Models can be Strong Self-Detoxifiers

Pin-Yu Chen

173

04 Oct 2024

Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges

Qin Liu

Chaowei Xiao

Muhao Chen

AAML

273

30 Sep 2024

Towards Robust Extractive Question Answering Models: Rethinking the Training MethodologyConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Son Quoc Tran

Matt Kretchmar

OOD

221

29 Sep 2024

Trustworthy AI: Securing Sensitive Data in Large Language ModelsApplied Informatics (AI), 2024

G. Feretzakis

V. Verykios

227

26 Sep 2024

BeanCounter: A low-toxicity, large-scale, and open dataset of business-oriented textNeural Information Processing Systems (NeurIPS), 2024

Siyan Wang

Bradford Levy

306

26 Sep 2024

Data-centric NLP Backdoor Defense from the Lens of MemorizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Zhenting Wang

Zhizhi Wang

Haoyang Ling

Mengnan Du

Juan Zhai

Shiqing Ma

266

21 Sep 2024

Causal Inference with Large Language Model: A SurveyNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Jing Ma

CML LRM

607

15 Sep 2024

The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs

279

01 Sep 2024

ContextCite: Attributing Model Generation to ContextNeural Information Processing Systems (NeurIPS), 2024

Aleksander Madry

361

01 Sep 2024

Legilimens: Practical and Unified Content Moderation for Large Language Model ServicesConference on Computer and Communications Security (CCS), 2024

356

28 Aug 2024

LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet

291

104

27 Aug 2024

Large Language Models are Good Attackers: Efficient and Stealthy Textual Backdoor Attacks

279

21 Aug 2024

Adversarial Attack for Explanation Robustness of Rationalization ModelsEuropean Conference on Artificial Intelligence (ECAI), 2024

Yuankai Zhang

Lingxiao Kong

Haozhao Wang

Ruixuan Li

Jun Wang

Yuhua Li

Wei Liu

AAML

392

20 Aug 2024

No Such Thing as a General Learner: Language models and their dual optimization

Emmanuel Chemla

R. Nefdt

207

18 Aug 2024

Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

526

12 Aug 2024

Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models?

Mohammad Bahrami Karkevandi

Nishant Vishwamitra

Peyman Najafirad

AAML

289

05 Aug 2024

Defining and Evaluating Decision and Composite Risk in Language Models Applied to Natural Language Inference

Ke Shen

Mayank Kejriwal

209

04 Aug 2024

Mission Impossible: A Statistical Perspective on Jailbreaking LLMsNeural Information Processing Systems (NeurIPS), 2024

Jingtong Su

Mingyu Lee

SangKeun Lee

216

02 Aug 2024