JumpReLU: A Retrofit Defense Strategy for Adversarial Attacks

7 April 2019

Papers citing "JumpReLU: A Retrofit Defense Strategy for Adversarial Attacks"

5 / 5 papers shown

Title
Rethinking Evaluation of Sparse Autoencoders through the Representation of Polysemous Words Gouki Minegishi Hiroki Furuta Yusuke Iwasawa Y. Matsuo 49 1 0 09 Jan 2025
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models Javier Ferrando Oscar Obeso Senthooran Rajamanoharan Neel Nanda 77 10 0 21 Nov 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models Daking Rai Yilun Zhou Shi Feng Abulhair Saparov Ziyu Yao 75 19 0 02 Jul 2024
Kryptonite: An Adversarial Attack Using Regional Focus Yogesh Kulkarni Krisha Bhambani AAML 19 3 0 23 Aug 2021
Adversarial examples in the physical world Alexey Kurakin Ian Goodfellow Samy Bengio SILM AAML 257 5,833 0 08 Jul 2016