Equalized odds postprocessing under imperfect group information
Most approaches aiming to ensure a model's fairness with respect to a protected attribute (such as gender or race) assume to know the true value of the attribute for every data point. In this paper, we ask to what extent fairness interventions can be effective even when only imperfect information about the protected attribute is available. In particular, we study the prominent equalized odds method of Hardt et al. (2016) under a perturbation of the protected attribute. We identify conditions on the perturbation that guarantee that the bias of a classifier is reduced even by running equalized odds with the perturbed attribute. We also study the error of the resulting classifier. We empirically show, and prove for a special case, that under our identified conditions the error does not suffer from a perturbation of the protected attribute.
View on arXiv