151
v1v2 (latest)

ORACLE: Explaining Feature Interactions in Neural Networks with ANOVA

Mohamed Jismy Aashik Rasool
Main:8 Pages
5 Figures
Bibliography:2 Pages
14 Tables
Appendix:20 Pages
Abstract

We introduce ORACLE, a framework for explaining neural networks on tabular data and scientific factorial designs. ORACLE summarizes a trained network's prediction surface with main effects and pairwise interactions by treating the network as a black-box response, discretizing the inputs onto a grid, and fitting an orthogonal factorial (ANOVA-style) surrogate -- the L2L^2 orthogonal projection of the model response onto a finite-dimensional factorial subspace. A simple centering and μ\mu-rebalancing step then expresses this surrogate as main- and interaction-effect tables that remain faithful to the original model in the L2L^2 sense. The resulting grid-based interaction maps are easy to visualize, comparable across backbones, and directly aligned with classical design-of-experiments practice. On synthetic factorial benchmarks and low- to medium-dimensional tabular regression tasks, ORACLE more accurately recovers ground-truth interaction structure and hotspots than Monte Carlo SHAP-family interaction methods, as measured by ranking, localization, and cross-backbone stability. In latent image and text settings, ORACLE clarifies its scope: grid-based factorial surrogates are most effective when features admit an interpretable factorial structure, making ORACLE particularly well-suited to scientific and engineering workflows that require stable, DoE-style interaction summaries.

View on arXiv
Comments on this paper