64

Density based Spatial Clustering of Lines via Probabilistic Generation of Neighbourhood

Akanksha Das
Malay Bhattacharyya
Main:9 Pages
1 Figures
Bibliography:1 Pages
4 Tables
Abstract

Density based spatial clustering of points in Rn\mathbb{R}^n has a myriad of applications in a variety of industries. We generalise this problem to the density based clustering of lines in high-dimensional spaces, keeping in mind there exists no valid distance measure that follows the triangle inequality for lines. In this paper, we design a clustering algorithm that generates a customised neighbourhood for a line of a fixed volume (given as a parameter), based on an optional parameter as a continuous probability density function. This algorithm is not sensitive to the outliers and can effectively identify the noise in the data using a cardinality parameter. One of the pivotal applications of this algorithm is clustering data points in Rn\mathbb{R}^n with missing entries, while utilising the domain knowledge of the respective data. In particular, the proposed algorithm is able to cluster nn-dimensional data points that contain at least (n1)(n-1)-dimensional information. We illustrate the neighbourhoods for the standard probability distributions with continuous probability density functions and demonstrate the effectiveness of our algorithm on various synthetic and real-world datasets (e.g., rail and road networks). The experimental results also highlight its application in clustering incomplete data.

View on arXiv
Comments on this paper