Augmenting Chest X-ray Datasets with Non-Expert Annotations

5 September 2023

Abstract

The advancement of machine learning algorithms in medical image analysis requires the expansion of training datasets. A popular and cost-effective approach is automated annotation extraction from free-text medical reports, primarily due to the high costs associated with expert clinicians annotating medical images, such as chest X-rays. However, it has been shown that the resulting datasets are susceptible to biases and shortcuts. Another strategy to increase the size of a dataset is crowdsourcing, a widely adopted practice in general computer vision with some success in medical image analysis. In a similar vein to crowdsourcing, we enhance two publicly available chest X-ray datasets by incorporating non-expert annotations. However, instead of using diagnostic labels, we annotate shortcuts in the form of tubes. We collect 3.5k chest drain annotations for NIH-CXR14, and 1k annotations for four different tube types in PadChest, and create the Non-Expert Annotations of Tubes in X-rays (NEATX) dataset. We train a chest drain detector with the non-expert annotations that generalizes well to expert labels. Moreover, we compare our annotations to those provided by experts and show "moderate" to "almost perfect" agreement. Finally, we present a pathology agreement study to raise awareness about the quality of ground truth annotations. We make our dataset available atthis https URLand our code available atthis https URL.

View on arXiv

@article{cheplygina2025_2309.02244,
  title={ Augmenting Chest X-ray Datasets with Non-Expert Annotations },
  author={ Veronika Cheplygina and Cathrine Damgaard and Trine Naja Eriksen and Dovile Juodelyte and Amelia Jiménez-Sánchez },
  journal={arXiv preprint arXiv:2309.02244},
  year={ 2025 }
}

Comments on this paper