Explainable post-training bias mitigation with distribution-based fairness metrics

Abstract
We develop a novel optimization framework with distribution-based fairness constraints for efficiently producing demographically blind, explainable models across a wide range of fairness levels. This is accomplished through post-processing, avoiding the need for retraining. Our framework, which is based on stochastic gradient descent, can be applied to a wide range of model types, with a particular emphasis on the post-processing of gradient-boosted decision trees. Additionally, we design a broad class of interpretable global bias metrics compatible with our method by building on previous work. We empirically test our methodology on a variety of datasets and compare it to other methods.
View on arXiv@article{franks2025_2504.01223, title={ Explainable post-training bias mitigation with distribution-based fairness metrics }, author={ Ryan Franks and Alexey Miroshnikov }, journal={arXiv preprint arXiv:2504.01223}, year={ 2025 } }
Comments on this paper