v1v2v3v4 (latest)

GRILL: Restoring Gradient Signal in Ill-Conditioned Layers for More Effective Adversarial Attacks on Autoencoders

6 May 2025

Chethan Krishnamurthy Ramanaik

ArXiv (abs)PDF HTML Github

Main:7 Pages

19 Figures

Bibliography:2 Pages

3 Tables

Appendix:10 Pages

Abstract

Adversarial robustness of deep autoencoders (AEs) has received less attention than that of discriminative models, although their compressed latent representations induce ill-conditioned mappings that can amplify small input perturbations and destabilize reconstructions. Existing white-box attacks for AEs, which optimize norm-bounded adversarial perturbations to maximize output damage, often stop at suboptimal attacks. We observe that this limitation stems from vanishing adversarial loss gradients during backpropagation through ill-conditioned layers, caused by near-zero singular values in their Jacobians. To address this issue, we introduce GRILL, a technique that locally restores gradient signals in ill-conditioned layers, enabling more effective norm-bounded attacks. Through extensive experiments across multiple AE architectures, considering both sample-specific and universal attacks under both standard and adaptive attack settings, we show that GRILL significantly increases attack effectiveness, leading to a more rigorous evaluation of AE robustness. Beyond AEs, we provide empirical evidence that modern multimodal architectures with encoder-decoder structures exhibit similar vulnerabilities under GRILL.

View on arXiv

Comments on this paper