53
30

Recovering Trees with Convex Clustering

Abstract

Convex clustering refers, for given {x1,,xn}\Realp\left\{x_1, \dots, x_n\right\} \subset \Real^p, to the minimization of \begin{eqnarray*} u(\gamma) & = & \underset{u_1, \dots, u_n }{\arg\min}\;\sum_{i=1}^{n}{\lVert x_i - u_i \rVert^2} + \gamma \sum_{i,j=1}^{n}{w_{ij} \lVert u_i - u_j\rVert},\\ \end{eqnarray*} where wij0w_{ij} \geq 0 is an affinity that quantifies the similarity between xix_i and xjx_j. We prove that if the affinities wijw_{ij} reflect a tree structure in the {x1,,xn}\left\{x_1, \dots, x_n\right\}, then the convex clustering solution path reconstructs the tree exactly. The main technical ingredient implies the following combinatorial byproduct: for every set {x1,,xn}\Realp\left\{x_1, \dots, x_n \right\} \subset \Real^p of n2n \geq 2 distinct points, there exist at least n/6n/6 points with the property that for any of these points xx there is a unit vector v\Realpv \in \Real^p such that, when viewed from xx, `most' points lie in the direction vv \begin{eqnarray*} \frac{1}{n-1}\sum_{i=1 \atop x_i \neq x}^{n}{ \left\langle \frac{x_i - x}{\lVert x_i - x \rVert}, v \right\rangle} & \geq & \frac{1}{4}. \end{eqnarray*}

View on arXiv
Comments on this paper