Fair Inference On Outcomes

29 May 2017

Abstract

Many data analysis tasks, such as solving prediction problems or inferring cause effect relationships, can be framed as statistical inference on models with outcome variables. This type of inference has been very successful in a variety of applications, including image and video analysis, speech recognition, machine translation, autonomous vehicle control, game playing, and validating hypotheses in the empirical sciences. As statistical and machine learning models become an increasingly ubiquitous part of our lives, policymakers, regulators, and advocates have expressed fears about the harmful impact of deployment of such models that encode harmful and discriminatory biases of their creators. A growing community is now addressing issues of fairness and transparency in data analysis in part by defining, analyzing, and mitigating harmful effects of algorithmic bias from a variety of perspectives and frameworks [3, 4, 6, 7, 8, 18]. In this paper, we consider the problem of fair statistical inference involving outcome variables. Examples include classification and regression problems, and estimating treatment effects in randomized trials or observational data. The issue of fairness arises in such problems where some covariates or treatments are "sensitive", in the sense of having potential of creating discrimination. In this paper, we argue that the presence of discrimination in our setting can be formalized in a sensible way as the presence of an effect of a sensitive covariate on the outcome along certain causal pathways, a view which generalizes [16]. We discuss a number of complications that arise in classical statistical inference due to this view, and suggest workarounds, based on recent work in causal and semi-parametric inference.

View on arXiv

Comments on this paper