Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2004.08994
Cited By
v1
v2 (latest)
Adversarial Training for Large Neural Language Models
20 April 2020
Xiaodong Liu
Hao Cheng
Pengcheng He
Weizhu Chen
Yu Wang
Hoifung Poon
Jianfeng Gao
AAML
Re-assign community
ArXiv (abs)
PDF
HTML
Github (2250★)
Papers citing
"Adversarial Training for Large Neural Language Models"
50 / 124 papers shown
Explainable Transformer-Based Email Phishing Classification with Adversarial Robustness
Sajad U P
AAML
330
0
0
15 Nov 2025
Generative AI for Biosciences: Emerging Threats and Roadmap to Biosecurity
Zaixi Zhang
Souradip Chakraborty
Amrit Singh Bedi
Emilin Mathew
Varsha Saravanan
...
Eric Xing
R. Altman
George Church
M. Y. Wang
Mengdi Wang
SILM
444
1
0
13 Oct 2025
SAGE: A Realistic Benchmark for Semantic Understanding
Samarth Goel
Reagan J. Lee
Kannan Ramchandran
ELM
VLM
108
1
0
25 Sep 2025
Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models
Qiming Guo
Jinwen Tang
Xingran Huang
167
1
0
25 Aug 2025
CCFC: Core & Core-Full-Core Dual-Track Defense for LLM Jailbreak Protection
Jiaming Hu
Haoyu Wang
Debarghya Mukherjee
Ioannis Ch. Paschalidis
AAML
100
0
0
19 Aug 2025
PRM-Free Security Alignment of Large Models via Red Teaming and Adversarial Training
Pengfei Du
AAML
155
2
0
14 Jul 2025
Multi-level Value Alignment in Agentic AI Systems: Survey and Perspectives
Wei Zeng
Hengshu Zhu
Chuan Qin
Han Wu
Yihang Cheng
...
Xiaowei Jin
Yinuo Shen
Zhenxing Wang
Feimin Zhong
Hui Xiong
AI4TS
447
0
0
11 Jun 2025
LLMs are Frequency Pattern Learners in Natural Language Inference
Liang Cheng
Zhaowei Wang
Mark Steedman
212
1
0
27 May 2025
Retrieval-Augmented Purifier for Robust LLM-Empowered Recommendation
Liangbo Ning
Wenqi Fan
Qing Li
AAML
313
4
0
03 Apr 2025
Enhancing Adversarial Robustness of Vision-Language Models through Low-Rank Adaptation
International Conference on Multimedia Retrieval (ICMR), 2024
Yuheng Ji
Yue Liu
Zhicheng Zhang
Zhao Zhang
Yuting Zhao
Gang Zhou
Xingwei Zhang
Xinwang Liu
Xiaolong Zheng
VLM
410
4
0
21 Feb 2025
Evaluating Concurrent Robustness of Language Models Across Diverse Challenge Sets
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Vatsal Gupta
Pranshu Pandya
Tushar Kataria
Vivek Gupta
Dan Roth
AAML
582
2
0
03 Jan 2025
Achieving Domain-Independent Certified Robustness via Knowledge Continuity
Neural Information Processing Systems (NeurIPS), 2024
Alan Sun
Chiyu Ma
Kenneth Ge
Soroush Vosoughi
299
2
0
03 Nov 2024
Adversarial Training: A Survey
Mengnan Zhao
Lihe Zhang
Jingwen Ye
Huchuan Lu
Baocai Yin
Xinchao Wang
AAML
310
12
0
19 Oct 2024
Estimating the Probabilities of Rare Outputs in Language Models
International Conference on Learning Representations (ICLR), 2024
Gabriel Wu
Jacob Hilton
AAML
UQCV
355
4
0
17 Oct 2024
Recent Advances in Attack and Defense Approaches of Large Language Models
Jing Cui
Yishi Xu
Zhewei Huang
Shuchang Zhou
Jianbin Jiao
Junge Zhang
PILM
AAML
351
9
0
05 Sep 2024
MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues
AAAI Conference on Artificial Intelligence (AAAI), 2024
Kuluhan Binici
Abhinav Ramesh Kashyap
Viktor Schlegel
Andy T. Liu
Vijay Prakash Dwivedi
Thanh-Tung Nguyen
Xiaoxue Gao
Nancy F. Chen
Stefan Winkler
235
6
0
26 Aug 2024
Detecting and Understanding Vulnerabilities in Language Models via Mechanistic Interpretability
International Joint Conference on Artificial Intelligence (IJCAI), 2024
Jorge García-Carrasco
A. Maté
Juan Trujillo
AAML
215
6
0
29 Jul 2024
Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
Anton Xue
Avishree Khare
Rajeev Alur
Surbhi Goel
Eric Wong
698
4
0
21 Jun 2024
Efficient Adversarial Training in LLMs with Continuous Attacks
Sophie Xhonneux
Alessandro Sordoni
Stephan Günnemann
Gauthier Gidel
Leo Schwinn
AAML
353
97
0
24 May 2024
PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning
International Conference on Machine Learning (ICML), 2024
Hyeong Kyu Choi
Yixuan Li
308
27
0
03 May 2024
Adversarial Attacks and Defense for Conversation Entailment Task
Zhenning Yang
Ryan Krawec
Liang-Yuan Wu
AAML
SILM
207
1
0
01 May 2024
Defending Against Unforeseen Failure Modes with Latent Adversarial Training
Stephen Casper
Lennart Schulze
Oam Patel
Dylan Hadfield-Menell
AAML
727
62
0
08 Mar 2024
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Kushagra Pandey
Robert Bamler
Sina Daubener
...
Yixin Wang
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
798
40
0
28 Feb 2024
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Leo Schwinn
David Dobre
Sophie Xhonneux
Gauthier Gidel
Stephan Gunnemann
AAML
476
81
0
14 Feb 2024
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Mantas Mazeika
Long Phan
Xuwang Yin
Andy Zou
Zifan Wang
...
Nathaniel Li
Steven Basart
Bo Li
David A. Forsyth
Dan Hendrycks
AAML
383
781
0
06 Feb 2024
IllusionX: An LLM-powered mixed reality personal companion
Ramez Yousri
Zeyad Essam
Yehia Kareem
Youstina Sherief
Sherry Gamil
Soha Safwat
197
11
0
04 Feb 2024
Building Guardrails for Large Language Models
Yizhen Dong
Ronghui Mu
Gao Jin
Yi Qi
Jinwei Hu
Xingyu Zhao
Jie Meng
Wenjie Ruan
Xiaowei Huang
OffRL
418
70
0
02 Feb 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Conference on Fairness, Accountability and Transparency (FAccT), 2024
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
568
136
0
25 Jan 2024
Fast Adversarial Training against Textual Adversarial Attacks
Yichen Yang
Xin Liu
Kun He
AAML
186
6
0
23 Jan 2024
METAL: Metamorphic Testing Framework for Analyzing Large-Language Model Qualities
Sangwon Hyun
Mingyu Guo
Muhammad Ali Babar
230
29
0
11 Dec 2023
Prompt Optimization via Adversarial In-Context Learning
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Do Xuan Long
Yiran Zhao
Hannah Brown
Yuxi Xie
James Xu Zhao
Nancy F. Chen
Kenji Kawaguchi
Michael Qizhe Xie
Junxian He
439
28
0
05 Dec 2023
A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly
High-Confidence Computing (HC), 2023
Yifan Yao
Jinhao Duan
Kaidi Xu
Yuanfang Cai
Eric Sun
Yue Zhang
PILM
ELM
624
972
0
04 Dec 2023
Improving the Robustness of Transformer-based Large Language Models with Dynamic Attention
Network and Distributed System Security Symposium (NDSS), 2023
Lujia Shen
Yuwen Pu
R. Beyah
Changjiang Li
Xuhong Zhang
Chunpeng Ge
Ting Wang
AAML
194
11
0
29 Nov 2023
Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information
Zhengmian Hu
Gang Wu
Saayan Mitra
Ruiyi Zhang
Tong Sun
Heng-Chiao Huang
Vishy Swaminathan
244
36
0
20 Nov 2023
Hijacking Large Language Models via Adversarial In-Context Learning
Yao Qiang
Xiangyu Zhou
Saleh Zare Zade
Prashant Khanduri
Dongxiao Zhu
514
48
0
16 Nov 2023
Robust Text Classification: Analyzing Prototype-Based Networks
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Zhivar Sourati
D. Deshpande
Filip Ilievski
Kiril Gashteovski
S. Saralajew
OOD
OffRL
283
7
0
11 Nov 2023
BERT Lost Patience Won't Be Robust to Adversarial Slowdown
Neural Information Processing Systems (NeurIPS), 2023
Zachary Coalson
Gabriel Ritter
Rakesh Bobba
Sanghyun Hong
AAML
332
2
0
29 Oct 2023
Data Optimization in Deep Learning: A Survey
IEEE Transactions on Knowledge and Data Engineering (TKDE), 2023
Ou Wu
Rujing Yao
341
6
0
25 Oct 2023
VIBE: Topic-Driven Temporal Adaptation for Twitter Classification
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yuji Zhang
Jing Li
Wenjie Li
VLM
438
18
0
16 Oct 2023
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Avi Schwarzschild
Eric Wong
Hamed Hassani
George J. Pappas
AAML
593
398
0
05 Oct 2023
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
985
528
0
19 Sep 2023
SentimentGPT: Exploiting GPT for Advanced Sentiment Analysis and its Departure from Current Machine Learning
Kiana Kheiri
Hamid Karimi
247
106
0
16 Jul 2023
A Comprehensive Overview of Large Language Models
ACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Lin Wang
OffRL
906
1,259
0
12 Jul 2023
MAT: Mixed-Strategy Game of Adversarial Training in Fine-tuning
International Joint Conference on Artificial Intelligence (IJCAI), 2023
Zhehua Zhong
Tianyi Chen
Zhen Wang
AAML
139
6
0
27 Jun 2023
Modeling Hierarchical Reasoning Chains by Linking Discourse Units and Key Phrases for Reading Comprehension
International Conference on Computational Linguistics (COLING), 2023
Pascal Hitzler
Zhuosheng Zhang
Shafiq Joty
AI4CE
198
8
0
21 Jun 2023
Prompt Injection attack against LLM-integrated Applications
Yi Liu
Gelei Deng
Yuekang Li
Kailong Wang
Zihao Wang
...
Yepang Liu
Haoyu Wang
Yanhong Zheng
Leo Yu Zhang
Yang Liu
SILM
521
586
0
08 Jun 2023
Toward Adversarial Training on Contextualized Language Representation
International Conference on Learning Representations (ICLR), 2023
Hongqiu Wu
Wenshu Fan
Han Shi
Haizhen Zhao
Hao Fei
AAML
161
15
0
08 May 2023
USTC-NELSLIP at SemEval-2023 Task 2: Statistical Construction and Dual Adaptation of Gazetteer for Multilingual Complex NER
International Workshop on Semantic Evaluation (SemEval), 2023
Jun-Yu Ma
Jia-Chen Gu
Jiajun Qi
Zhen-Hua Ling
Quan Liu
Xiaoyi Zhao
197
3
0
04 May 2023
A Review of ChatGPT Applications in Education, Marketing, Software Engineering, and Healthcare: Benefits, Drawbacks, and Research Directions
Mohammad Fraiwan
Natheer Khasawneh
243
59
0
29 Apr 2023
CRL+: A Novel Semi-Supervised Deep Active Contrastive Representation Learning-Based Text Classification Model for Insurance Data
Journal of Advances in Information Technology (JAIT), 2023
Amir Namavar Jahromi
Ebrahim Pourjafari
H. Karimipour
Amit Satpathy
Lovell Hodge
155
4
0
08 Feb 2023
1
2
3
Next
Page 1 of 3