ResearchTrend.AI
4
1

Data Curves Clustering Using Common Patterns Detection

Abstract

For the past decades we have experienced an enormous expansion of the accumulated data that humanity produces. Daily a numerous number of smart devices, usually interconnected over internet, produce vast, real-values datasets. Time series representing datasets from completely irrelevant domains such as finance, weather, medical applications, traffic control etc. become more and more crucial in human day life. Analyzing and clustering these time series, or in general any kind of curves, could be critical for several human activities. In the current paper, the new Curves Clustering Using Common Patterns (3CP) methodology is introduced, which applies a repeated pattern detection algorithm in order to cluster sequences according to their shape and the similarities of common patterns between time series, data curves and eventually any kind of discrete sequences. For this purpose, the Longest Expected Repeated Pattern Reduced Suffix Array (LERP-RSA) data structure has been used in combination with the All Repeated Patterns Detection (ARPaD) algorithm in order to perform highly accurate and efficient detection of similarities among data curves that can be used for clustering purposes and which also provides additional flexibility and features.

View on arXiv
Comments on this paper