11

Uncovering Modality Discrepancy and Generalization Illusion for General-Purpose 3D Medical Segmentation

Yichi Zhang
Feiyang Xiao
Le Xue
Wenbo Zhang
Gang Feng
Chenguang Zheng
Yuan Qi
Yuan Cheng
Zixin Hu
Main:8 Pages
4 Figures
Bibliography:2 Pages
3 Tables
Appendix:2 Pages
Abstract

While emerging 3D medical foundation models are envisioned as versatile tools with offer general-purpose capabilities, their validation remains largely confined to regional and structural imaging, leaving a significant modality discrepancy unexplored. To provide a rigorous and objective assessment, we curate the UMD dataset comprising 490 whole-body PET/CT and 464 whole-body PET/MRI scans (\sim675k 2D images, \sim12k 3D organ annotations) and conduct a thorough and comprehensive evaluation of representative 3D segmentation foundation models. Through intra-subject controlled comparisons of paired scans, we isolate imaging modality as the primary independent variable to evaluate model robustness in real-world applications. Our evaluation reveals a stark discrepancy between literature-reported benchmarks and real-world efficacy, particularly when transitioning from structural to functional domains. Such systemic failures underscore that current 3D foundation models are far from achieving truly general-purpose status, necessitating a paradigm shift toward multi-modal training and evaluation to bridge the gap between idealized benchmarking and comprehensive clinical utility. This dataset and analysis establish a foundational cornerstone for future research to develop truly modality-agnostic medical foundation models.

View on arXiv
Comments on this paper