v1v2 (latest)

Adversarial Training for Large Neural Language Models

20 April 2020

Xiaodong Liu

ArXiv (abs)PDF HTML Github (2250★)

Papers citing "Adversarial Training for Large Neural Language Models"

50 / 124 papers shown

Explainable Transformer-Based Email Phishing Classification with Adversarial Robustness

Sajad U P

AAML

349

15 Nov 2025

Generative AI for Biosciences: Emerging Threats and Roadmap to Biosecurity

...

452

13 Oct 2025

SAGE: A Realistic Benchmark for Semantic Understanding

140

25 Sep 2025

Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models

Qiming Guo

Jinwen Tang

Xingran Huang

167

25 Aug 2025

CCFC: Core & Core-Full-Core Dual-Track Defense for LLM Jailbreak Protection

Jiaming Hu

Haoyu Wang

Debarghya Mukherjee

Ioannis Ch. Paschalidis

AAML

107

19 Aug 2025

PRM-Free Security Alignment of Large Models via Red Teaming and Adversarial Training

Pengfei Du

AAML

157

14 Jul 2025

Multi-level Value Alignment in Agentic AI Systems: Survey and Perspectives

...

492

11 Jun 2025

LLMs are Frequency Pattern Learners in Natural Language Inference

Liang Cheng

Zhaowei Wang

Mark Steedman

232

27 May 2025

Retrieval-Augmented Purifier for Robust LLM-Empowered Recommendation

326

03 Apr 2025

Enhancing Adversarial Robustness of Vision-Language Models through Low-Rank AdaptationInternational Conference on Multimedia Retrieval (ICMR), 2024

411

21 Feb 2025

Evaluating Concurrent Robustness of Language Models Across Diverse Challenge SetsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

607

03 Jan 2025

Achieving Domain-Independent Certified Robustness via Knowledge ContinuityNeural Information Processing Systems (NeurIPS), 2024

299

03 Nov 2024

Adversarial Training: A Survey

Lihe Zhang

Baocai Yin

316

19 Oct 2024

Estimating the Probabilities of Rare Outputs in Language ModelsInternational Conference on Learning Representations (ICLR), 2024

Gabriel Wu

Jacob Hilton

AAML UQCV

379

17 Oct 2024

Recent Advances in Attack and Defense Approaches of Large Language Models

354

05 Sep 2024

MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic DialoguesAAAI Conference on Artificial Intelligence (AAAI), 2024

Kuluhan Binici

Abhinav Ramesh Kashyap

Viktor Schlegel

Andy T. Liu

Vijay Prakash Dwivedi

Thanh-Tung Nguyen

Xiaoxue Gao

Nancy F. Chen

Stefan Winkler

235

26 Aug 2024

Detecting and Understanding Vulnerabilities in Language Models via Mechanistic InterpretabilityInternational Joint Conference on Artificial Intelligence (IJCAI), 2024

Jorge García-Carrasco

A. Maté

Juan Trujillo

AAML

215

29 Jul 2024

Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference

Eric Wong

707

21 Jun 2024

Efficient Adversarial Training in LLMs with Continuous Attacks

Stephan Günnemann

362

24 May 2024

PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context LearningInternational Conference on Machine Learning (ICML), 2024

Hyeong Kyu Choi

Yixuan Li

333

03 May 2024

Adversarial Attacks and Defense for Conversation Entailment Task

225

01 May 2024

Defending Against Unforeseen Failure Modes with Latent Adversarial Training

Stephen Casper

Lennart Schulze

Oam Patel

Dylan Hadfield-Menell

AAML

729

08 Mar 2024

On the Challenges and Opportunities in Generative AI

...

827

28 Feb 2024

Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space

Stephan Gunnemann

481

14 Feb 2024

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

...

383

802

06 Feb 2024

IllusionX: An LLM-powered mixed reality personal companion

206

04 Feb 2024

Building Guardrails for Large Language Models

424

02 Feb 2024

Black-Box Access is Insufficient for Rigorous AI AuditsConference on Fairness, Accountability and Transparency (FAccT), 2024

...

Dylan Hadfield-Menell

AAML

568

139

25 Jan 2024

Fast Adversarial Training against Textual Adversarial Attacks

189

23 Jan 2024

METAL: Metamorphic Testing Framework for Analyzing Large-Language Model Qualities

Sangwon Hyun

Mingyu Guo

Muhammad Ali Babar

237

11 Dec 2023

Prompt Optimization via Adversarial In-Context LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

445

05 Dec 2023

A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the UglyHigh-Confidence Computing (HC), 2023

626

988

04 Dec 2023

Improving the Robustness of Transformer-based Large Language Models with Dynamic AttentionNetwork and Distributed System Security Symposium (NDSS), 2023

Yuwen Pu

Xuhong Zhang

203

29 Nov 2023

Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information

247

20 Nov 2023

Hijacking Large Language Models via Adversarial In-Context Learning

546

16 Nov 2023

Robust Text Classification: Analyzing Prototype-Based NetworksConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

284

11 Nov 2023

BERT Lost Patience Won't Be Robust to Adversarial SlowdownNeural Information Processing Systems (NeurIPS), 2023

338

29 Oct 2023

Data Optimization in Deep Learning: A SurveyIEEE Transactions on Knowledge and Data Engineering (TKDE), 2023

Ou Wu

Rujing Yao

342

25 Oct 2023

VIBE: Topic-Driven Temporal Adaptation for Twitter ClassificationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

450

16 Oct 2023

SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

George J. Pappas

599

409

05 Oct 2023

GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts

1.0K

531

19 Sep 2023

SentimentGPT: Exploiting GPT for Advanced Sentiment Analysis and its Departure from Current Machine Learning

Kiana Kheiri

Hamid Karimi

252

106

16 Jul 2023

A Comprehensive Overview of Large Language ModelsACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023

Saeed Anwar

Muhammad Usman

1.0K

1,315

12 Jul 2023

MAT: Mixed-Strategy Game of Adversarial Training in Fine-tuningInternational Joint Conference on Artificial Intelligence (IJCAI), 2023

Zhehua Zhong

Tianyi Chen

Zhen Wang

AAML

139

27 Jun 2023

Modeling Hierarchical Reasoning Chains by Linking Discourse Units and Key Phrases for Reading ComprehensionInternational Conference on Computational Linguistics (COLING), 2023

223

21 Jun 2023

Prompt Injection attack against LLM-integrated Applications

Yi Liu

Kailong Wang

...

Haoyu Wang

Leo Yu Zhang

Yang Liu

SILM

521

598

08 Jun 2023

Toward Adversarial Training on Contextualized Language RepresentationInternational Conference on Learning Representations (ICLR), 2023

165

08 May 2023

USTC-NELSLIP at SemEval-2023 Task 2: Statistical Construction and Dual Adaptation of Gazetteer for Multilingual Complex NERInternational Workshop on Semantic Evaluation (SemEval), 2023

205

04 May 2023

A Review of ChatGPT Applications in Education, Marketing, Software Engineering, and Healthcare: Benefits, Drawbacks, and Research Directions

Mohammad Fraiwan

Natheer Khasawneh

282

29 Apr 2023

CRL+: A Novel Semi-Supervised Deep Active Contrastive Representation Learning-Based Text Classification Model for Insurance DataJournal of Advances in Information Technology (JAIT), 2023

163

08 Feb 2023