12
0

PathDiff: Histopathology Image Synthesis with Unpaired Text and Mask Conditions

Mahesh Bhosale
Abdul Wasi
Yuanhao Zhai
Yunjie Tian
Samuel Border
Nan Xi
Pinaki Sarder
Junsong Yuan
David Doermann
Xuan Gong
Main:8 Pages
14 Figures
Bibliography:3 Pages
15 Tables
Appendix:8 Pages
Abstract

Diffusion-based generative models have shown promise in synthesizing histopathology images to address data scarcity caused by privacy constraints. Diagnostic text reports provide high-level semantic descriptions, and masks offer fine-grained spatial structures essential for representing distinct morphological regions. However, public datasets lack paired text and mask data for the same histopathological images, limiting their joint use in image generation. This constraint restricts the ability to fully exploit the benefits of combining both modalities for enhanced control over semantics and spatial details. To overcome this, we propose PathDiff, a diffusion framework that effectively learns from unpaired mask-text data by integrating both modalities into a unified conditioning space. PathDiff allows precise control over structural and contextual features, generating high-quality, semantically accurate images. PathDiff also improves image fidelity, text-image alignment, and faithfulness, enhancing data augmentation for downstream tasks like nuclei segmentation and classification. Extensive experiments demonstrate its superiority over existing methods.

View on arXiv
@article{bhosale2025_2506.23440,
  title={ PathDiff: Histopathology Image Synthesis with Unpaired Text and Mask Conditions },
  author={ Mahesh Bhosale and Abdul Wasi and Yuanhao Zhai and Yunjie Tian and Samuel Border and Nan Xi and Pinaki Sarder and Junsong Yuan and David Doermann and Xuan Gong },
  journal={arXiv preprint arXiv:2506.23440},
  year={ 2025 }
}
Comments on this paper