Axiomatic Explanations for Visual Search, Retrieval, and Similarity
Learning
- FAtt
Visual search, recommendation, and contrastive similarity learning power a wide breadth of technologies that impact billions of users across the world. The best-performing approaches are often complex and difficult to interpret, and there are several competing techniques one can use to explain a search engine's behavior. We show that the theory of fair credit assignment provides a unique axiomatic solution that generalizes several existing recommendation- and metric-explainability techniques in the literature. Using this formalism, we are able to determine in what regimes existing approaches fall short of fairness and provide variations that are fair in more situations and handle counterfactual information. More specifically, we show existing approaches implicitly approximate second-order Shapley-Taylor indices and use this perspective to extend CAM, GradCAM, LIME, SHAP, SBSM, and other methods to search engines. These extensions can extract pairwise correspondences between images from trained black-box models. We also introduce a fast kernel-based method for estimating Shapley-Taylor indices that require orders of magnitude fewer function evaluations to converge. Finally, we evaluate these methods and show that these game-theoretic measures yield more consistent explanations for image similarity architectures.
View on arXiv