100
v1v2v3 (latest)

Fairness Interventions: A Study in AI Explainability

Main:26 Pages
7 Figures
Bibliography:1 Pages
7 Tables
Appendix:7 Pages
Abstract

This paper presents a philosophical and experimental study of fairness interventions in AI classification, centered on the explainability of corrective methods. We argue that ensuring fairness requires not only satisfying a target criterion, but also explaining which variables constrain its realization. When corrections are used to mitigate advantage transparently, they must remain sensitive to the distribution of true labels. To illustrate this approach, we built FairDream, a fairness package whose mechanism is made transparent for lay users, increasing the model's weights of errors on disadvantaged groups. While a user may intend to achieve Demographic Parity by the correction method, experiments show that FairDream tends towards Equalized Odds, revealing a conservative bias inherent to the data environment. We clarify the relationship between these fairness criteria, analyze FairDream's reweighting process, and compare its trade-offs with closely related GridSearch models. Finally, we justify the normative preference for Equalized Odds via an epistemological interpretation of the results, using their proximity with Simpson's paradox. The paper thus unites normative, epistemological, and empirical explanations of fairness interventions, to ensure transparency for the users.

View on arXiv
Comments on this paper