204

When Are Learning Biases Equivalent? A Unifying Framework for Fairness, Robustness, and Distribution Shift

Main:5 Pages
Bibliography:1 Pages
7 Tables
Appendix:3 Pages
Abstract

Machine learning systems exhibit diverse failure modes: unfairness toward protected groups, brittleness to spurious correlations, poor performance on minority sub-populations, which are typically studied in isolation by distinct research communities. We propose a unifying theoretical framework that characterizes when different bias mechanisms produce quantitatively equivalent effects on model performance. By formalizing biases as violations of conditional independence through information-theoretic measures, we prove formal equivalence conditions relating spurious correlations, subpopulation shift, class imbalance, and fairness violations. Our theory predicts that a spurious correlation of strength α\alpha produces equivalent worst-group accuracy degradation as a sub-population imbalance ratio r(1+α)/(1α)r \approx (1+\alpha)/(1-\alpha) under feature overlap assumptions. Empirical validation in six datasets and three architectures confirms that predicted equivalences hold within the accuracy of the worst group 3\%, enabling the principled transfer of debiasing methods across problem domains. This work bridges the literature on fairness, robustness, and distribution shifts under a common perspective.

View on arXiv
Comments on this paper