ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.07125
  4. Cited By
Universal Adversarial Triggers for Attacking and Analyzing NLP
v1v2v3 (latest)

Universal Adversarial Triggers for Attacking and Analyzing NLP

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
20 August 2019
Eric Wallace
Shi Feng
Nikhil Kandpal
Matt Gardner
Sameer Singh
    AAMLSILM
ArXiv (abs)PDFHTML

Papers citing "Universal Adversarial Triggers for Attacking and Analyzing NLP"

50 / 662 papers shown
SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models
SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Seanie Lee
Dong Bok Lee
Dominik Wagner
Minki Kang
Haebin Seong
Tobias Bocklet
Juho Lee
Sung Ju Hwang
539
3
0
18 Feb 2025
Universal Adversarial Attack on Aligned Multimodal LLMs
Universal Adversarial Attack on Aligned Multimodal LLMs
Temurbek Rahmatullaev
Polina Druzhinina
Nikita Kurdiukov
Matvey Mikhalchuk
Andrey Kuznetsov
Anton Razzhigaev
AAML
493
6
0
11 Feb 2025
SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text Generation
SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text GenerationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Saurabh Kumar Pandey
S. Vashistha
Debrup Das
Somak Aditya
Monojit Choudhury
AAML
390
0
0
10 Feb 2025
Democratic Training Against Universal Adversarial PerturbationsInternational Conference on Learning Representations (ICLR), 2025
Bing-Jie Sun
Jun Sun
Wei Zhao
AAML
279
1
0
08 Feb 2025
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
Isha Gupta
David Khachaturov
Robert D. Mullins
AAMLAuLLM
556
5
0
02 Feb 2025
A Comprehensive Survey of Foundation Models in Medicine
A Comprehensive Survey of Foundation Models in MedicineIEEE Reviews in Biomedical Engineering (RBME), 2024
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CELM&MAVLM
773
72
0
17 Jan 2025
CALM: Curiosity-Driven Auditing for Large Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2025
Xiang Zheng
Longxiang Wang
Yi Liu
Jie Zhang
Chao Shen
Cong Wang
MLAU
343
5
0
06 Jan 2025
LLM-Virus: Evolutionary Jailbreak Attack on Large Language Models
Miao Yu
Cunchun Li
Yingjie Zhou
Xing Fan
Kun Wang
Shirui Pan
Qingsong Wen
AAML
392
8
0
03 Jan 2025
Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines
Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines
Xiyang Hu
AAML
334
3
0
01 Jan 2025
GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search
GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search
Matan Ben-Tov
Mahmood Sharif
RALM
536
4
0
30 Dec 2024
Diverse and Effective Red Teaming with Auto-generated Rewards and
  Multi-step Reinforcement Learning
Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning
Alex Beutel
Kai Y. Xiao
Johannes Heidecke
Lilian Weng
AAML
183
17
0
24 Dec 2024
Robustness of Large Language Models Against Adversarial Attacks
Robustness of Large Language Models Against Adversarial Attacks
Yiyi Tao
Yixian Shen
Hang Zhang
Yanxin Shen
Lun Wang
Chuanqi Shi
Shaoshuai Du
AAML
266
14
0
22 Dec 2024
Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context
Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context
Nilanjana Das
Edward Raff
Aman Chadha
Manas Gaur
AAML
606
4
0
20 Dec 2024
Are Language Models Agnostic to Linguistically Grounded Perturbations? A
  Case Study of Indic Languages
Are Language Models Agnostic to Linguistically Grounded Perturbations? A Case Study of Indic LanguagesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Poulami Ghosh
Mary Dabre
Pushpak Bhattacharyya
AAML
308
0
0
14 Dec 2024
The Vulnerability of Language Model Benchmarks: Do They Accurately
  Reflect True LLM Performance?
The Vulnerability of Language Model Benchmarks: Do They Accurately Reflect True LLM Performance?
Sourav Banerjee
Ayushi Agarwal
Eishkaran Singh
ELM
266
20
0
02 Dec 2024
Towards Resource Efficient and Interpretable Bias Mitigation in Large
  Language Models
Towards Resource Efficient and Interpretable Bias Mitigation in Large Language Models
S. Tong
Eliott Zemour
Rawisara Lohanimit
Lalana Kagal
226
0
0
02 Dec 2024
On the Adversarial Robustness of Instruction-Tuned Large Language Models
  for Code
On the Adversarial Robustness of Instruction-Tuned Large Language Models for Code
Md. Imran Hossen
X. Hei
AAMLELM
327
0
0
29 Nov 2024
All-in-one Weather-degraded Image Restoration via Adaptive
  Degradation-aware Self-prompting Model
All-in-one Weather-degraded Image Restoration via Adaptive Degradation-aware Self-prompting ModelIEEE transactions on multimedia (IEEE TMM), 2024
Yuanbo Wen
Tao Gao
Ziqi Li
Jing Zhang
Kaihao Zhang
Ting Chen
VLMDiffM
264
28
0
12 Nov 2024
Enhancing Financial Fraud Detection with Human-in-the-Loop Feedback and
  Feedback Propagation
Enhancing Financial Fraud Detection with Human-in-the-Loop Feedback and Feedback PropagationInternational Conference on Machine Learning and Applications (ICMLA), 2024
Prashank Kadam
190
3
0
07 Nov 2024
Achieving Domain-Independent Certified Robustness via Knowledge
  Continuity
Achieving Domain-Independent Certified Robustness via Knowledge ContinuityNeural Information Processing Systems (NeurIPS), 2024
Alan Sun
Chiyu Ma
Kenneth Ge
Soroush Vosoughi
297
2
0
03 Nov 2024
Attacking Misinformation Detection Using Adversarial Examples Generated by Language Models
Attacking Misinformation Detection Using Adversarial Examples Generated by Language Models
Piotr Przybyła
Euan McGill
Horacio Saggion
AAML
244
5
0
28 Oct 2024
Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting
Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt OverfittingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Mohamed Salim Aissi
Clément Romac
Thomas Carta
Sylvain Lamprier
Pierre-Yves Oudeyer
Olivier Sigaud
Laure Soulier
Nicolas Thome
279
3
0
25 Oct 2024
Adversarial Attacks on Large Language Models Using Regularized
  Relaxation
Adversarial Attacks on Large Language Models Using Regularized Relaxation
Samuel Jacob Chacko
Sajib Biswas
Chashi Mahiul Islam
Fatema Tabassum Liza
Xiuwen Liu
AAML
241
10
0
24 Oct 2024
Towards Reliable Evaluation of Behavior Steering Interventions in LLMs
Towards Reliable Evaluation of Behavior Steering Interventions in LLMs
Itamar Pres
Laura Ruis
Ekdeep Singh Lubana
David M. Krueger
LLMSV
196
21
0
22 Oct 2024
AdvAgent: Controllable Blackbox Red-teaming on Web Agents
AdvAgent: Controllable Blackbox Red-teaming on Web Agents
Chejian Xu
Mintong Kang
Jiawei Zhang
Zeyi Liao
Lingbo Mo
Mengqi Yuan
Huan Sun
Bo Li
AAML
173
15
0
22 Oct 2024
SPIN: Self-Supervised Prompt INjection
SPIN: Self-Supervised Prompt INjection
Leon Zhou
Junfeng Yang
Chengzhi Mao
AAMLSILM
263
1
0
17 Oct 2024
To Err is AI : A Case Study Informing LLM Flaw Reporting Practices
To Err is AI : A Case Study Informing LLM Flaw Reporting PracticesAAAI Conference on Artificial Intelligence (AAAI), 2024
Sean McGregor
Allyson Ettinger
Nick Judd
Paul Albee
Liwei Jiang
...
Avijit Ghosh
Christopher Fiorelli
Michelle Hoang
Sven Cattell
Nouha Dziri
200
5
0
15 Oct 2024
Deciphering the Chaos: Enhancing Jailbreak Attacks via Adversarial Prompt Translation
Deciphering the Chaos: Enhancing Jailbreak Attacks via Adversarial Prompt Translation
Qizhang Li
Xiaochen Yang
W. Zuo
Yiwen Guo
AAML
358
3
0
15 Oct 2024
PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
Tingchen Fu
Mrinank Sharma
Juil Sock
Shay B. Cohen
David M. Krueger
Fazl Barez
AAML
467
24
0
11 Oct 2024
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win RatesInternational Conference on Learning Representations (ICLR), 2024
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min Lin
291
22
0
09 Oct 2024
Attribute Controlled Fine-tuning for Large Language Models: A Case Study
  on Detoxification
Attribute Controlled Fine-tuning for Large Language Models: A Case Study on DetoxificationConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Tao Meng
Ninareh Mehrabi
Palash Goyal
Anil Ramakrishna
Aram Galstyan
Richard Zemel
Kai-Wei Chang
Rahul Gupta
Charith Peris
134
3
0
07 Oct 2024
Collaboration! Towards Robust Neural Methods for Routing Problems
Collaboration! Towards Robust Neural Methods for Routing ProblemsNeural Information Processing Systems (NeurIPS), 2024
Jianan Zhou
Yaoxin Wu
Zhiguang Cao
Wen Song
Jie Zhang
Zhiqi Shen
AAML
208
4
0
07 Oct 2024
Large Language Models can be Strong Self-Detoxifiers
Large Language Models can be Strong Self-Detoxifiers
Ching-Yun Ko
Pin-Yu Chen
Payel Das
Youssef Mroueh
Soham Dan
Georgios Kollias
Subhajit Chaudhury
Tejaswini Pedapati
Luca Daniel
173
5
0
04 Oct 2024
Mitigating Backdoor Threats to Large Language Models: Advancement and
  Challenges
Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges
Qin Liu
Wenjie Mo
Terry Tong
Lyne Tchapmi
Fei Wang
Chaowei Xiao
Muhao Chen
AAML
273
11
0
30 Sep 2024
Towards Robust Extractive Question Answering Models: Rethinking the
  Training Methodology
Towards Robust Extractive Question Answering Models: Rethinking the Training MethodologyConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Son Quoc Tran
Matt Kretchmar
OOD
221
1
0
29 Sep 2024
Trustworthy AI: Securing Sensitive Data in Large Language Models
Trustworthy AI: Securing Sensitive Data in Large Language ModelsApplied Informatics (AI), 2024
G. Feretzakis
V. Verykios
227
37
0
26 Sep 2024
BeanCounter: A low-toxicity, large-scale, and open dataset of
  business-oriented text
BeanCounter: A low-toxicity, large-scale, and open dataset of business-oriented textNeural Information Processing Systems (NeurIPS), 2024
Siyan Wang
Bradford Levy
306
3
0
26 Sep 2024
Data-centric NLP Backdoor Defense from the Lens of Memorization
Data-centric NLP Backdoor Defense from the Lens of MemorizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Zhenting Wang
Zhizhi Wang
Haoyang Ling
Mengnan Du
Juan Zhai
Shiqing Ma
266
5
0
21 Sep 2024
Causal Inference with Large Language Model: A Survey
Causal Inference with Large Language Model: A SurveyNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Jing Ma
CMLLRM
607
23
0
15 Sep 2024
The Dark Side of Human Feedback: Poisoning Large Language Models via
  User Inputs
The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs
Bocheng Chen
Hanqing Guo
Guangjing Wang
Yuanda Wang
Qiben Yan
AAML
279
9
0
01 Sep 2024
ContextCite: Attributing Model Generation to Context
ContextCite: Attributing Model Generation to ContextNeural Information Processing Systems (NeurIPS), 2024
Benjamin Cohen-Wang
Harshay Shah
Kristian Georgiev
Aleksander Madry
LRM
361
61
0
01 Sep 2024
Legilimens: Practical and Unified Content Moderation for Large Language
  Model Services
Legilimens: Practical and Unified Content Moderation for Large Language Model ServicesConference on Computer and Communications Security (CCS), 2024
Jialin Wu
Jiangyi Deng
Shengyuan Pang
Yanjiao Chen
Jiayang Xu
Xinfeng Li
Wei Dong
356
12
0
28 Aug 2024
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
Nathaniel Li
Ziwen Han
Ian Steneker
Willow Primack
Riley Goodside
Hugh Zhang
Zifan Wang
Cristina Menghini
Summer Yue
AAMLMU
291
104
0
27 Aug 2024
Large Language Models are Good Attackers: Efficient and Stealthy Textual
  Backdoor Attacks
Large Language Models are Good Attackers: Efficient and Stealthy Textual Backdoor Attacks
Wandi Qiao
Yueqi Zeng
Pengfei Xia
Lei Liu
Zhangjie Fu
Bin Li
SILMAAML
279
4
0
21 Aug 2024
Adversarial Attack for Explanation Robustness of Rationalization Models
Adversarial Attack for Explanation Robustness of Rationalization ModelsEuropean Conference on Artificial Intelligence (ECAI), 2024
Yuankai Zhang
Lingxiao Kong
Haozhao Wang
Ruixuan Li
Jun Wang
Yuhua Li
Wei Liu
AAML
392
1
0
20 Aug 2024
No Such Thing as a General Learner: Language models and their dual
  optimization
No Such Thing as a General Learner: Language models and their dual optimization
Emmanuel Chemla
R. Nefdt
207
1
0
18 Aug 2024
Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models
Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Hila Gonen
Terra Blevins
Alisa Liu
Luke Zettlemoyer
Noah A. Smith
526
11
0
12 Aug 2024
Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large
  Language Models?
Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models?
Mohammad Bahrami Karkevandi
Nishant Vishwamitra
Peyman Najafirad
AAML
289
1
0
05 Aug 2024
Defining and Evaluating Decision and Composite Risk in Language Models
  Applied to Natural Language Inference
Defining and Evaluating Decision and Composite Risk in Language Models Applied to Natural Language Inference
Ke Shen
Mayank Kejriwal
209
2
0
04 Aug 2024
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Mission Impossible: A Statistical Perspective on Jailbreaking LLMsNeural Information Processing Systems (NeurIPS), 2024
Jingtong Su
Mingyu Lee
SangKeun Lee
216
22
0
02 Aug 2024
Previous
123456...121314
Next
Page 3 of 14
Pageof 14