Wasserstein projection distance for fairness testing of regression models
Fairness testing evaluates whether a model satisfies a specified fairness criterion across different groups, yet most research has focused on classification models, leaving regression models underexplored. This paper introduces a framework for fairness testing in regression models, leveraging Wasserstein distance to project data distribution and focusing on expectation-based criteria. Upon categorizing fairness criteria for regression, we derive a Wasserstein projection test statistic from dual reformulation, and derive asymptotic bounds and limiting distributions, allowing us to formulate both a hypothesis-testing procedure and an optimal data perturbation method to improve fairness while balancing accuracy. Experiments on synthetic data demonstrate that the proposed hypothesis-testing approach offers higher specificity compared to permutation-based tests. To illustrate its potential applications, we apply our framework to two case studies on real data, showing (1) statistically significant gender disparities that appear on student performance data across multiple models, and (2) significant unfairness between pollution areas under multiple fairness criteria affecting housing price data, robust to different group divisions, with feature-level analysis identifying spatial and socioeconomic drivers.
View on arXiv