Encoding Visual Attributes in Capsules for Explainable Medical Diagnoses

12 September 2019

Abstract

In high-risk domains, understanding the reasons behind machine-generated predictions is vital in assessing trust. In this study, we introduce a novel design of multi-task capsule network to provide explainable medical image-based diagnosis. Our proposed explainable capsule architecture, called X-Caps, encodes high-level visual attributes within the vectors of its capsules, then forms predictions based on these interpretable features. Since these attributes are independent, we modify the dynamic routing algorithm to independently route information from child capsules to parents. To increase the explainability of our method further, we propose to train our network on a distribution of expert labels directly, rather than the average of these labels as done in previous studies. This provides a meaningful metric of model confidence, punishing over/under confidence, directly supervised by human-experts' agreement. In our example high-risk application of lung cancer diagnosis, we conduct experiments on a large and diverse dataset of over 1000 CT scans, where our proposed X-Caps, a relatively small 2D capsule network, significantly outperforms the previous state-of-the-art deep dual-path dense 3D CNN in predicting visual attribute scores while also improving diagnostic accuracy. To the best of our knowledge, this is the first study to investigate capsule networks for making predictions based on human-level interpretable visual attributes in general and its applications to explainable medical image diagnosis in particular.

View on arXiv

Comments on this paper