A Distribution Testing Approach to Clustering Distributions
Gunjan Kumar
Yash Pote
Jonathan Scarlett
Main:9 Pages
Bibliography:3 Pages
2 Tables
Appendix:14 Pages
Abstract
We study the following distribution clustering problem: Given a hidden partition of distributions into two groups, such that the distributions within each group are the same, and the two distributions associated with the two clusters are -far in total variation, the goal is to recover the partition. We establish upper and lower bounds on the sample complexity for two fundamental cases: (1) when one of the cluster's distributions is known, and (2) when both are unknown. Our upper and lower bounds characterize the sample complexity's dependence on the domain size , number of distributions , size of one of the clusters, and distance . In particular, we achieve tightness with respect to (up to an factor) for all regimes.
View on arXivComments on this paper
