639

Sequence Graph Transform (SGT): A Feature Embedding Function for Sequence Data Mining (Extended Version)

Abstract

Sequence feature embedding is a challenging task due to un-structuredness of sequences --arbitrary strings of arbitrary length. Existing methods are efficient in extracting short-term dependencies but typically suffer from computation issues for the long-term. Sequence Graph Transform (SGT), a feature embedding function, that can extract any amount of short- to long- term dependencies without increasing the computation -- proved theoretically -- is proposed. SGT features yield significantly superior results in sequence clustering and classification with higher accuracy and lower computation as compared to the existing methods, including the state-of-the-art sequence/string Kernels and LSTM in Deep Learning.

View on arXiv
Comments on this paper