ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.14226
39
42

Recommendations on test datasets for evaluating AI solutions in pathology

21 April 2022
A. Homeyer
Christian Geißler
L. O. Schwen
Falk Zakrzewski
Theodore Evans
K. Strohmenger
Max Westphal
R. D. Bülow
Michael Kargl
Aray Karjauv
Isidre Munné-Bertran
Charles Retzlaff
Adrià Romero-López
Tomasz Soltysinski
M. Plass
Rita Carvalho
Peter Steinbach
Yu-Chia Lan
Nassim Bouteldja
D. Haber
Mateo Rojas-Carulla
A. Vafaei Sadr
Matthias Kraft
Daniel Krüger
Rutger Fick
Tobias Lang
P. Boor
Heimo Muller
P. Hufnagl
N. Zerbe
ArXivPDFHTML
Abstract

Artificial intelligence (AI) solutions that automatically extract information from digital histology images have shown great promise for improving pathological diagnosis. Prior to routine use, it is important to evaluate their predictive performance and obtain regulatory approval. This assessment requires appropriate test datasets. However, compiling such datasets is challenging and specific recommendations are missing. A committee of various stakeholders, including commercial AI developers, pathologists, and researchers, discussed key aspects and conducted extensive literature reviews on test datasets in pathology. Here, we summarize the results and derive general recommendations for the collection of test datasets. We address several questions: Which and how many images are needed? How to deal with low-prevalence subsets? How can potential bias be detected? How should datasets be reported? What are the regulatory requirements in different countries? The recommendations are intended to help AI developers demonstrate the utility of their products and to help regulatory agencies and end users verify reported performance measures. Further research is needed to formulate criteria for sufficiently representative test datasets so that AI solutions can operate with less user intervention and better support diagnostic workflows in the future.

View on arXiv
Comments on this paper