46

A tissue and cell-level annotated H&E and PD-L1 histopathology image dataset in non-small cell lung cancer

Joey Spronck
Leander van Eekelen
Dominique van Midden
Joep Bogaerts
Leslie Tessier
Valerie Dechering
Muradije Demirel-Andishmand
Gabriel Silva de Souza
Roland Nemeth
Enrico Munari
Giuseppe Bogina
Ilaria Girolami
Albino Eccher
Balazs Acs
Ceren Boyaci
Natalie Klubickova
Monika Looijen-Salamon
Shoko Vos
Francesco Ciompi
Main:10 Pages
4 Figures
Bibliography:2 Pages
Abstract

The tumor immune microenvironment (TIME) in non-small cell lung cancer (NSCLC) histopathology contains morphological and molecular characteristics predictive of immunotherapy response. Computational quantification of TIME characteristics, such as cell detection and tissue segmentation, can support biomarker development. However, currently available digital pathology datasets of NSCLC for the development of cell detection or tissue segmentation algorithms are limited in scope, lack annotations of clinically prevalent metastatic sites, and forgo molecular information such as PD-L1 immunohistochemistry (IHC). To fill this gap, we introduce the IGNITE data toolkit, a multi-stain, multi-centric, and multi-scanner dataset of annotated NSCLC whole-slide images. We publicly release 887 fully annotated regions of interest from 155 unique patients across three complementary tasks: (i) multi-class semantic segmentation of tissue compartments in H&E-stained slides, with 16 classes spanning primary and metastatic NSCLC, (ii) nuclei detection, and (iii) PD-L1 positive tumor cell detection in PD-L1 IHC slides. To the best of our knowledge, this is the first public NSCLC dataset with manual annotations of H&E in metastatic sites and PD-L1 IHC.

View on arXiv
Comments on this paper