381

ProtoDash: Fast Interpretable Prototype Selection

Abstract

In this paper we propose an efficient algorithm ProtoDash for selecting prototypical examples from complex datasets. Our work builds on top of the learn to criticize (L2C) work by Kim et al. (2016) and generalizes it in at least two notable ways: 1) ProtoDash not only selects prototypes for a given sparsity level mm but it also associates non-negative weights with each of them indicative of the importance of each prototype. 2) ProtoDash not only finds prototypical examples for a dataset XX, but it can also find prototypical examples from XX that best represent another dataset YY, where XX and YY belong to the same feature space. We provide approximation guarantees for our algorithm by showing that the problem is weakly submodular and depict its efficacy on diverse domains namely; retail, digit recognition (MNIST) and on the latest publicly available 40 health questionnaires obtained from the Center for Disease Control (CDC) website maintained by the US Dept. of Health. We validate the results quantitatively as well as qualitatively based on expert feedback and recently published scientific studies on public health.

View on arXiv
Comments on this paper