Explainable post-training bias mitigation with distribution-based fairness metrics

1 April 2025

Ryan Franks

A. Miroshnikov

Konstandinos Kotsiopoulos

ArXiv (abs)PDF HTML

Main:39 Pages

4 Figures

Bibliography:6 Pages

3 Tables

Abstract

We develop a novel optimization framework with distribution-based fairness constraints for efficiently producing demographically blind, explainable models across a wide range of fairness levels. This is accomplished through post-processing, avoiding the need for retraining. Our framework, which is based on stochastic gradient descent, can be applied to a wide range of model types, with a particular emphasis on the post-processing of gradient-boosted decision trees. Additionally, we design a broad class of interpretable global bias metrics compatible with our method by building on previous work. We empirically test our methodology on a variety of datasets and compare it to other methods.

View on arXiv

Comments on this paper