ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.07061
20
3

Exact and Approximate Hierarchical Clustering Using A*

14 April 2021
Craig S. Greenberg
S. Macaluso
Nicholas Monath
Kumar Avinava Dubey
Patrick Flaherty
Manzil Zaheer
Amr Ahmed
Kyle Cranmer
Andrew McCallum
ArXivPDFHTML
Abstract

Hierarchical clustering is a critical task in numerous domains. Many approaches are based on heuristics and the properties of the resulting clusterings are studied post hoc. However, in several applications, there is a natural cost function that can be used to characterize the quality of the clustering. In those cases, hierarchical clustering can be seen as a combinatorial optimization problem. To that end, we introduce a new approach based on A* search. We overcome the prohibitively large search space by combining A* with a novel \emph{trellis} data structure. This combination results in an exact algorithm that scales beyond previous state of the art, from a search space with 101210^{12}1012 trees to 101510^{15}1015 trees, and an approximate algorithm that improves over baselines, even in enormous search spaces that contain more than 10100010^{1000}101000 trees. We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks. We describe how our method provides significantly improved theoretical bounds on the time and space complexity of A* for clustering.

View on arXiv
Comments on this paper