v1v2v3 (latest)

Data clustering: a fundamental method in data science and management

25 December 2024

Joaquín Torres-Sospedra

ArXiv (abs)PDF HTML Github

Main:15 Pages

22 Figures

Bibliography:3 Pages

Abstract

This paper explores the critical role of data clustering in data science, emphasizing its methodologies, tools, and diverse applications. Traditional techniques, such as partitional and hierarchical clustering, are analyzed alongside advanced approaches such as data stream, density-based, graph-based, and model-based clustering for handling complex structured datasets. The paper highlights key principles underpinning clustering, outlines widely used tools and frameworks, introduces the workflow of clustering in data science, discusses challenges in practical implementation, and examines various applications of clustering. By focusing on these foundations and applications, the discussion underscores clustering's transformative potential. The paper concludes with insights into future research directions, emphasizing clustering's role in driving innovation and enabling data-driven decision-making.

View on arXiv

Comments on this paper