Uncovering Modality Discrepancy and Generalization Illusion for General-Purpose 3D Medical Segmentation

7 February 2026

Yichi Zhang

Feiyang Xiao

Le Xue

Wenbo Zhang

Gang Feng

Chenguang Zheng

Yuan Qi

Yuan Cheng

Zixin Hu

MedIm

ArXiv (abs)PDF HTML Github

Main:8 Pages

4 Figures

Bibliography:2 Pages

3 Tables

Appendix:2 Pages

Abstract

While emerging 3D medical foundation models are envisioned as versatile tools with offer general-purpose capabilities, their validation remains largely confined to regional and structural imaging, leaving a significant modality discrepancy unexplored. To provide a rigorous and objective assessment, we curate the UMD dataset comprising 490 whole-body PET/CT and 464 whole-body PET/MRI scans ( $\sim$ 675k 2D images, $\sim$ 12k 3D organ annotations) and conduct a thorough and comprehensive evaluation of representative 3D segmentation foundation models. Through intra-subject controlled comparisons of paired scans, we isolate imaging modality as the primary independent variable to evaluate model robustness in real-world applications. Our evaluation reveals a stark discrepancy between literature-reported benchmarks and real-world efficacy, particularly when transitioning from structural to functional domains. Such systemic failures underscore that current 3D foundation models are far from achieving truly general-purpose status, necessitating a paradigm shift toward multi-modal training and evaluation to bridge the gap between idealized benchmarking and comprehensive clinical utility. This dataset and analysis establish a foundational cornerstone for future research to develop truly modality-agnostic medical foundation models.

View on arXiv

Comments on this paper