v1v2v3 (latest)

Hidden in Plain Sight -- Class Competition Focuses Attribution Maps

10 March 2025

ArXiv (abs)PDF HTML Github

Main:8 Pages

23 Figures

Bibliography:2 Pages

13 Tables

Appendix:18 Pages

Abstract

Attribution methods reveal which input features a neural network uses for a prediction, adding transparency to their decisions. A common problem is that these attributions seem unspecific, highlighting both important and irrelevant features. We revisit the common attribution pipeline and observe that using logits as attribution target is a main cause of this phenomenon. We show that the solution is in plain sight: considering distributions of attributions over multiple classes using existing attribution methods yields specific and fine-grained attributions. On common benchmarks, including the grid-pointing game and randomization-based sanity checks, this improves the ability of 18 attribution methods across 7 architectures up to 2x, agnostic to model architecture.

View on arXiv

Comments on this paper