ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.02053
20
0

Generalization Performance of Ensemble Clustering: From Theory to Algorithm

1 June 2025
Xu Zhang
Haoye Qiu
Weixuan Liang
Hui Liu
Junhui Hou
Yuheng Jia
ArXiv (abs)PDFHTML
Main:8 Pages
7 Figures
Bibliography:3 Pages
7 Tables
Appendix:16 Pages
Abstract

Ensemble clustering has demonstrated great success in practice; however, its theoretical foundations remain underexplored. This paper examines the generalization performance of ensemble clustering, focusing on generalization error, excess risk and consistency. We derive a convergence rate of generalization error bound and excess risk bound both of O(log⁡nm+1n)\mathcal{O}(\sqrt{\frac{\log n}{m}}+\frac{1}{\sqrt{n}})O(mlogn​​+n​1​), with nnn and mmm being the numbers of samples and base clusterings. Based on this, we prove that when mmm and nnn approach infinity and mmm is significantly larger than log nnn, i.e., m,n→∞,m≫log⁡nm,n\to \infty, m\gg \log nm,n→∞,m≫logn, ensemble clustering is consistent. Furthermore, recognizing that nnn and mmm are finite in practice, the generalization error cannot be reduced to zero. Thus, by assigning varying weights to finite clusterings, we minimize the error between the empirical average clusterings and their expectation. From this, we theoretically demonstrate that to achieve better clustering performance, we should minimize the deviation (bias) of base clustering from its expectation and maximize the differences (diversity) among various base clusterings. Additionally, we derive that maximizing diversity is nearly equivalent to a robust (min-max) optimization model. Finally, we instantiate our theory to develop a new ensemble clustering algorithm. Compared with SOTA methods, our approach achieves average improvements of 6.1%, 7.3%, and 6.0% on 10 datasets w.r.t. NMI, ARI, and Purity. The code is available atthis https URL.

View on arXiv
@article{zhang2025_2506.02053,
  title={ Generalization Performance of Ensemble Clustering: From Theory to Algorithm },
  author={ Xu Zhang and Haoye Qiu and Weixuan Liang and Hui Liu and Junhui Hou and Yuheng Jia },
  journal={arXiv preprint arXiv:2506.02053},
  year={ 2025 }
}
Comments on this paper