17

Learning Accurate Segmentation Purely from Self-Supervision

Zuyao You
Zuxuan Wu
Yu-Gang Jiang
Main:11 Pages
4 Figures
Bibliography:3 Pages
4 Tables
Abstract

Accurately segmenting objects without any manual annotations remains one of the core challenges in computer vision. In this work, we introduce Selfment, a fully self-supervised framework that segments foreground objects directly from raw images without human labels, pretrained segmentation models, or any post-processing. Selfment first constructs patch-level affinity graphs from self-supervised features and applies NCut to obtain an initial coarse foreground--background separation. We then introduce Iterative Patch Optimization (IPO), a feature-space refinement procedure that progressively enforces spatial coherence and semantic consistency through iterative patch clustering. The refined masks are subsequently used as supervisory signals to train a lightweight segmentation head with contrastive and region-consistency objectives, allowing the model to learn stable and transferable object representations. Despite its simplicity and complete absence of manual supervision, Selfment sets new state-of-the-art (SoTA) results across multiple benchmarks. It achieves substantial improvements on FmaxF_{\max} over previous unsupervised saliency detection methods on ECSSD (+4.0%+4.0\%), HKUIS (+4.6%+4.6\%), and PASCAL-S (+5.7%+5.7\%). Moreover, without any additional fine-tuning, Selfment demonstrates remarkable zero-shot generalization to camouflaged object detection tasks (e.g., 0.9100.910 SmS_m on CHAMELEON and 0.7920.792 FβωF_{\beta}^{\omega} on CAMO), outperforming all existing unsupervised approaches and even rivaling the SoTA fully supervised methods.

View on arXiv
Comments on this paper