v1v2 (latest)

Explainable Clustering Beyond Worst-Case Guarantees

3 November 2024

Maximilian Fleissner

Maedeh Zarvandi

Debarghya Ghoshdastidar

ArXiv (abs)PDF HTML

Main:26 Pages

2 Figures

Bibliography:3 Pages

2 Tables

Abstract

We study the explainable clustering problem first posed by Moshkovitz, Dasgupta, Rashtchian, and Frost (ICML 2020). The goal of explainable clustering is to fit an axis-aligned decision tree with $K$ leaves and minimal clustering cost (where every leaf is a cluster). The fundamental theoretical question in this line of work is the \textit{price of explainability}, defined as the ratio between the clustering cost of the tree and the optimal cost. Numerous papers have provided worst-case guarantees on this quantity. For $K$ -medians, it has recently been shown that the worst-case price of explainability is $\Theta(\log K)$ . While this settles the matter from a data-agnostic point of view, two important questions remain unanswered: Are tighter guarantees possible for well-clustered data? And can we trust decision trees to recover underlying cluster structures? In this paper, we place ourselves in a statistical setting of mixture models to answer both questions. We prove that better guarantees are indeed feasible for well-clustered data. Our algorithm takes as input a mixture model and constructs a tree in data-independent time. We then extend our analysis to kernel clustering, deriving new guarantees that significantly improve over existing worst-case bounds.

View on arXiv

Comments on this paper