DualProtoSeg: Simple and Efficient Design with Text- and Image-Guided Prototype Learning for Weakly Supervised Histopathology Image Segmentation

11 December 2025

Anh M. Vu

Khang P. Le

Trang T. K. Vo

Ha Thach

Huy Hung Nguyen

David Yang

Han H. Huynh

Quynh Nguyen

Tuan M. Pham

Tuan-Anh Le

Minh H. N. Le

Thanh-Huy Nguyen

Akash Awasthi

Chandra Mohan

Zhu Han

Hien Van Nguyen

ArXiv (abs)PDF HTML Github (1★)

Main:10 Pages

4 Figures

Bibliography:3 Pages

4 Tables

Abstract

Weakly supervised semantic segmentation (WSSS) in histopathology seeks to reduce annotation cost by learning from image-level labels, yet it remains limited by inter-class homogeneity, intra-class heterogeneity, and the region-shrinkage effect of CAM-based supervision. We propose a simple and effective prototype-driven framework that leverages vision-language alignment to improve region discovery under weak supervision. Our method integrates CoOp-style learnable prompt tuning to generate text-based prototypes and combines them with learnable image prototypes, forming a dual-modal prototype bank that captures both semantic and appearance cues. To address oversmoothing in ViT representations, we incorporate a multi-scale pyramid module that enhances spatial precision and improves localization quality. Experiments on the BCSS-WSSS benchmark show that our approach surpasses existing state-of-the-art methods, and detailed analyses demonstrate the benefits of text description diversity, context length, and the complementary behavior of text and image prototypes. These results highlight the effectiveness of jointly leveraging textual semantics and visual prototype learning for WSSS in digital pathology.

View on arXiv

Comments on this paper