To What Extent Do Token-Level Representations from Pathology Foundation Models Improve Dense Prediction?

3 February 2026

Weiming Chen

Xitong Ling

Xidong Wang

Zhenyang Cai

Yijia Guo

Mingxi Fu

Ziyi Zeng

Minxi Ouyang

Jiawen Li

Yizhi Wang

Tian Guan

Benyou Wang

Yonghong He

ArXiv (abs)PDF HTML Github (175★)

Main:8 Pages

8 Figures

Bibliography:3 Pages

93 Tables

Appendix:95 Pages

Abstract

Pathology foundation models (PFMs) have rapidly advanced and are becoming a common backbone for downstream clinical tasks, offering strong transferability across tissues and institutions. However, for dense prediction (e.g., segmentation), practical deployment still lacks a clear, reproducible understanding of how different PFMs behave across datasets and how adaptation choices affect performance and stability. We present PFM-DenseBench, a large-scale benchmark for dense pathology prediction, evaluating 17 PFMs across 18 public segmentation datasets. Under a unified protocol, we systematically assess PFMs with multiple adaptation and fine-tuning strategies, and derive insightful, practice-oriented findings on when and why different PFMs and tuning choices succeed or fail across heterogeneous datasets. We release containers, configs, and dataset cards to enable reproducible evaluation and informed PFM selection for real-world dense pathology tasks. Project Website:this https URL

View on arXiv

Comments on this paper