459
v1v2v3 (latest)

Hidden in Plain Sight -- Class Competition Focuses Attribution Maps

Main:8 Pages
23 Figures
Bibliography:2 Pages
13 Tables
Appendix:18 Pages
Abstract

Attribution methods reveal which input features a neural network uses for a prediction, adding transparency to their decisions. A common problem is that these attributions seem unspecific, highlighting both important and irrelevant features. We revisit the common attribution pipeline and observe that using logits as attribution target is a main cause of this phenomenon. We show that the solution is in plain sight: considering distributions of attributions over multiple classes using existing attribution methods yields specific and fine-grained attributions. On common benchmarks, including the grid-pointing game and randomization-based sanity checks, this improves the ability of 18 attribution methods across 7 architectures up to 2x, agnostic to model architecture.

View on arXiv
Comments on this paper