Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.04636
Cited By
"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks
10 April 2022
Edoardo Mosca
Shreyash Agarwal
Javier Rando
Georg Groh
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
""That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks"
9 / 9 papers shown
Title
Learning on LLM Output Signatures for gray-box LLM Behavior Analysis
Guy Bar-Shalom
Fabrizio Frasca
Derek Lim
Yoav Gelberg
Yftah Ziser
Ran El-Yaniv
Gal Chechik
Haggai Maron
62
0
0
18 Mar 2025
TextShield: Beyond Successfully Detecting Adversarial Sentences in Text Classification
Lingfeng Shen
Ze Zhang
Haiyun Jiang
Ying Chen
AAML
23
5
0
03 Feb 2023
ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation
Fan Yin
Yao Li
Cho-Jui Hsieh
Kai-Wei Chang
AAML
58
4
0
22 Oct 2022
An Interpretability Evaluation Benchmark for Pre-trained Language Models
Ya-Ming Shen
Lijie Wang
Ying Chen
Xinyan Xiao
Jing Liu
Hua-Hong Wu
27
4
0
28 Jul 2022
Exploring Adversarial Attacks and Defenses in Vision Transformers trained with DINO
Javier Rando
Nasib Naimi
Thomas Baumann
Max Mathys
AAML
18
5
0
14 Jun 2022
Certified Robustness to Adversarial Word Substitutions
Robin Jia
Aditi Raghunathan
Kerem Göksel
Percy Liang
AAML
178
290
0
03 Sep 2019
Generating Natural Language Adversarial Examples
M. Alzantot
Yash Sharma
Ahmed Elgohary
Bo-Jhang Ho
Mani B. Srivastava
Kai-Wei Chang
AAML
243
914
0
21 Apr 2018
Adversarial Example Generation with Syntactically Controlled Paraphrase Networks
Mohit Iyyer
John Wieting
Kevin Gimpel
Luke Zettlemoyer
AAML
GAN
185
711
0
17 Apr 2018
Adversarial examples in the physical world
Alexey Kurakin
Ian Goodfellow
Samy Bengio
SILM
AAML
250
5,830
0
08 Jul 2016
1