Model Extraction and Adversarial Transferability, Your BERT is Vulnerable!

North American Chapter of the Association for Computational Linguistics (NAACL), 2021

18 March 2021

Lichao Sun

Papers citing "Model Extraction and Adversarial Transferability, Your BERT is Vulnerable!"

50 / 52 papers shown

Adversarial Defence without Adversarial Defence: Enhancing Language Model Robustness via Instance-level Principal Component Removal

356

29 Jul 2025

Coordinated Robustness Evaluation Framework for Vision-Language Models

Ricardo Luna Gutierrez

Soumyendu Sarkar

AAML

204

05 Jun 2025

Model Stealing for Any Low-Rank Language ModelSymposium on the Theory of Computing (STOC), 2024

Allen Liu

Ankur Moitra

253

12 Nov 2024

A Middle Path for On-Premises LLM Deployment: Preserving Privacy Without Sacrificing Model Confidentiality

278

15 Oct 2024

Privacy Evaluation Benchmarks for NLP ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Wei Huang

Yinggui Wang

Cen Chen

ELM SILM

439

24 Sep 2024

WET: Overcoming Paraphrasing Vulnerabilities in Embeddings-as-a-Service with Linear Transformation WatermarksAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

430

29 Aug 2024

VidModEx: Interpretable and Efficient Black Box Model Extraction for High-Dimensional Spaces

Somnath Sendhil Kumar

Yuvaraj Govindarajulu

Pavan Kulkarni

Manojkumar Somabhai Parmar

FAtt

244

04 Aug 2024

Risks, Causes, and Mitigations of Widespread Deployments of Large Language Models (LLMs): A Survey

Md Athikul Islam

320

01 Aug 2024

Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)

560

20 Jul 2024

Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything

320

01 Jul 2024

IDT: Dual-Task Adversarial Attacks for Privacy Protection

Pedro Faustini

Shakila Mahjabin Tonni

258

28 Jun 2024

Transferable Embedding Inversion Attack: Uncovering Privacy Risks in Text Embeddings without Model Queries

360

12 Jun 2024

The Impact of Quantization on the Robustness of Transformer-based Text Classifiers

Seyed Parsa Neshaei

Yasaman Boreshban

Gholamreza Ghassem-Sani

Seyed Abolghasem Mirroshandel

238

08 Mar 2024

WARDEN: Multi-Directional Backdoor Watermarks for Embedding-as-a-Service Copyright Protection

348

03 Mar 2024

Amplifying Training Data Exposure through Fine-Tuning with Pseudo-Labeled Memberships

371

19 Feb 2024

PAL: Proxy-Guided Black-Box Attack on Large Language Models

254

15 Feb 2024

Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks

Chenyu Zhang

Yiwen Ma

Anan Liu

597

16 Jan 2024

Punctuation Matters! Stealthy Backdoor Attack for Language Models

311

26 Dec 2023

SenTest: Evaluating Robustness of Sentence Encoders

248

29 Nov 2023

Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt CalibrationNeural Information Processing Systems (NeurIPS), 2023

Yong Li

588

10 Nov 2023

Army of Thieves: Enhancing Black-Box Model Extraction via Ensemble based sample selection

292

08 Nov 2023

A Survey on Transferability of Adversarial Examples across Deep Neural Networks

Jindong Gu

...

494

26 Oct 2023

BufferSearch: Generating Black-Box Adversarial Texts With Lower Queries

430

14 Oct 2023

The Trickle-down Impact of Reward (In-)consistency on RLHF

Lingfeng Shen

Linfeng Song

Daniel Khashabi

Dong Yu

269

28 Sep 2023

Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks

Hongcheng Gao

Hao Zhang

Yinpeng Dong

Zhijie Deng

AAML

364

16 Jun 2023

Revealing the Blind Spot of Sentence Encoder Evaluation by HEROSWorkshop on Representation Learning for NLP (RepL4NLP), 2023

324

08 Jun 2023

MAWSEO: Adversarial Wiki Search Poisoning for Illicit Online PromotionIEEE Symposium on Security and Privacy (IEEE S&P), 2023

Xiaozhong Liu

281

22 Apr 2023

Stealing the Decoding Algorithms of Language ModelsConference on Computer and Communications Security (CCS), 2023

427

08 Mar 2023

Training-free Lexical Backdoor Attacks on Language ModelsThe Web Conference (WWW), 2023

313

08 Feb 2023

Protecting Language Generation Models via Invisible WatermarkingInternational Conference on Machine Learning (ICML), 2023

Xuandong Zhao

Yu-Xiang Wang

Lei Li

WaLM

465

114

06 Feb 2023

TextShield: Beyond Successfully Detecting Adversarial Sentences in Text ClassificationInternational Conference on Learning Representations (ICLR), 2023

Lingfeng Shen

454

03 Feb 2023

Model Extraction Attack against Self-supervised Speech Models

284

29 Nov 2022

UPTON: Preventing Authorship Leakage from Public Text Release via Data PoisoningConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Ziyao Wang

Thai Le

Dongwon Lee

233

17 Nov 2022

Preserving Semantics in Textual Adversarial AttacksEuropean Conference on Artificial Intelligence (ECAI), 2022

333

08 Nov 2022

Extracted BERT Model Leaks More Information than You Think!Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022

233

21 Oct 2022

Distillation-Resistant Watermarking for Model Protection in NLPConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Xuandong Zhao

Lei Li

Yu-Xiang Wang

WaLM

299

07 Oct 2022

CATER: Intellectual Property Protection on Text Generation APIs via Conditional WatermarksNeural Information Processing Systems (NeurIPS), 2022

Yi Zeng

Jiwei Li

409

19 Sep 2022

I Know What You Trained Last Summer: A Survey on Stealing Machine Learning Models and DefencesACM Computing Surveys (ACM CSUR), 2022

Daryna Oliynyk

Rudolf Mayer

Andreas Rauber

374

165

16 Jun 2022

Edge Security: Challenges and Issues

292

14 Jun 2022

A Word is Worth A Thousand Dollars: Adversarial Attack on Tweets Fools Stock PredictionsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

Jinjun Xiong

380

01 May 2022

A Girl Has A Name, And It's ... Adversarial Authorship Attribution for DeobfuscationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

207

22 Mar 2022

On Robust Prefix-Tuning for Text ClassificationInternational Conference on Learning Representations (ICLR), 2022

Zonghan Yang

Yang Liu

VLM

262

19 Mar 2022

A Survey of Adversarial Defences and Robustness in NLP

645

12 Mar 2022

Threats to Pre-trained Language Models: Survey and Taxonomy

Jiwei Li

216

14 Feb 2022

Fooling MOSS Detection with Pretrained Language ModelsInternational Conference on Information and Knowledge Management (CIKM), 2022

Stella Biderman

Edward Raff

DeLMO

246

19 Jan 2022

Protecting Intellectual Property of Language Generation APIs with Lexical WatermarkAAAI Conference on Artificial Intelligence (AAAI), 2021

466

122

05 Dec 2021

Virtual Data Augmentation: A Robust and General Framework for Fine-tuning Pre-trained Models

231

13 Sep 2021

Student Surpasses Teacher: Imitation Attack for Black-Box NLP APIsInternational Conference on Computational Linguistics (COLING), 2021

317

29 Aug 2021

Killing One Bird with Two Stones: Model Extraction and Attribute Inference Attacks against BERT-based APIs

206

23 May 2021

Membership Inference Attacks on Knowledge Graphs

Yu Wang

Lifu Huang

Philip S. Yu

Lichao Sun

MIACV

258

16 Apr 2021