21
12

Who's responsible? Jointly quantifying the contribution of the learning algorithm and training data

Abstract

A learning algorithm AA trained on a dataset DD is revealed to have poor performance on some subpopulation at test time. Where should the responsibility for this lay? It can be argued that the data is responsible, if for example training AA on a more representative dataset DD' would have improved the performance. But it can similarly be argued that AA itself is at fault, if training a different variant AA' on the same dataset DD would have improved performance. As ML becomes widespread and such failure cases more common, these types of questions are proving to be far from hypothetical. With this motivation in mind, in this work we provide a rigorous formulation of the joint credit assignment problem between a learning algorithm AA and a dataset DD. We propose Extended Shapley as a principled framework for this problem, and experiment empirically with how it can be used to address questions of ML accountability.

View on arXiv
Comments on this paper