ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.04636
  4. Cited By
"That Is a Suspicious Reaction!": Interpreting Logits Variation to
  Detect NLP Adversarial Attacks

"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks

10 April 2022
Edoardo Mosca
Shreyash Agarwal
Javier Rando
Georg Groh
    AAML
ArXivPDFHTML

Papers citing ""That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks"

9 / 9 papers shown
Title
Learning on LLM Output Signatures for gray-box LLM Behavior Analysis
Learning on LLM Output Signatures for gray-box LLM Behavior Analysis
Guy Bar-Shalom
Fabrizio Frasca
Derek Lim
Yoav Gelberg
Yftah Ziser
Ran El-Yaniv
Gal Chechik
Haggai Maron
62
0
0
18 Mar 2025
TextShield: Beyond Successfully Detecting Adversarial Sentences in Text
  Classification
TextShield: Beyond Successfully Detecting Adversarial Sentences in Text Classification
Lingfeng Shen
Ze Zhang
Haiyun Jiang
Ying Chen
AAML
23
5
0
03 Feb 2023
ADDMU: Detection of Far-Boundary Adversarial Examples with Data and
  Model Uncertainty Estimation
ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation
Fan Yin
Yao Li
Cho-Jui Hsieh
Kai-Wei Chang
AAML
58
4
0
22 Oct 2022
An Interpretability Evaluation Benchmark for Pre-trained Language Models
An Interpretability Evaluation Benchmark for Pre-trained Language Models
Ya-Ming Shen
Lijie Wang
Ying Chen
Xinyan Xiao
Jing Liu
Hua-Hong Wu
27
4
0
28 Jul 2022
Exploring Adversarial Attacks and Defenses in Vision Transformers
  trained with DINO
Exploring Adversarial Attacks and Defenses in Vision Transformers trained with DINO
Javier Rando
Nasib Naimi
Thomas Baumann
Max Mathys
AAML
18
5
0
14 Jun 2022
Certified Robustness to Adversarial Word Substitutions
Certified Robustness to Adversarial Word Substitutions
Robin Jia
Aditi Raghunathan
Kerem Göksel
Percy Liang
AAML
178
290
0
03 Sep 2019
Generating Natural Language Adversarial Examples
Generating Natural Language Adversarial Examples
M. Alzantot
Yash Sharma
Ahmed Elgohary
Bo-Jhang Ho
Mani B. Srivastava
Kai-Wei Chang
AAML
243
914
0
21 Apr 2018
Adversarial Example Generation with Syntactically Controlled Paraphrase
  Networks
Adversarial Example Generation with Syntactically Controlled Paraphrase Networks
Mohit Iyyer
John Wieting
Kevin Gimpel
Luke Zettlemoyer
AAML
GAN
185
711
0
17 Apr 2018
Adversarial examples in the physical world
Adversarial examples in the physical world
Alexey Kurakin
Ian Goodfellow
Samy Bengio
SILM
AAML
250
5,830
0
08 Jul 2016
1