ProtoDash: Fast Interpretable Prototype Selection
In this paper we propose an efficient algorithm ProtoDash for selecting prototypical examples from complex datasets. Our work builds on top of the learn to criticize (L2C) work by Kim et al. (2016) and generalizes it in at least two notable ways: 1) ProtoDash not only selects prototypes for a given sparsity level but it also associates non-negative weights with each of them indicative of the importance of each prototype. 2) ProtoDash not only finds prototypical examples for a dataset , but it can also find prototypical examples from that best represent another dataset , where and belong to the same feature space. We provide approximation guarantees for our algorithm by showing that the problem is weakly submodular and depict its efficacy on diverse domains namely; retail, digit recognition (MNIST) and on the latest publicly available 40 health questionnaires obtained from the Center for Disease Control (CDC) website maintained by the US Dept. of Health. We validate the results quantitatively as well as qualitatively based on expert feedback and recently published scientific studies on public health.
View on arXiv