v1v2 (latest)

Clustering via Self-Supervised Diffusion

6 July 2025

ArXiv (abs)PDF HTML Github

Main:8 Pages

11 Figures

Bibliography:4 Pages

2 Tables

Appendix:4 Pages

Abstract

Diffusion models, widely recognized for their success in generative tasks, have not yet been applied to clustering. We introduce Clustering via Diffusion (CLUDI), a self-supervised framework that combines the generative power of diffusion models with pre-trained Vision Transformer features to achieve robust and accurate clustering. CLUDI is trained via a teacher-student paradigm: the teacher uses stochastic diffusion-based sampling to produce diverse cluster assignments, which the student refines into stable predictions. This stochasticity acts as a novel data augmentation strategy, enabling CLUDI to uncover intricate structures in high-dimensional data. Extensive evaluations on challenging datasets demonstrate that CLUDI achieves state-of-the-art performance in unsupervised classification, setting new benchmarks in clustering robustness and adaptability to complex data distributions. Our code is available atthis https URL.

View on arXiv

Comments on this paper