51

A Fast Randomized Algorithm for Finding the Maximal Common Subsequences

Abstract

Finding the common subsequences of LL multiple strings has many applications in the area of bioinformatics, computational linguistics, and information retrieval. A well-known result states that finding a Longest Common Subsequence (LCS) for LL strings is NP-hard, e.g., the computational complexity is exponential in LL. In this paper, we develop a randomized algorithm, referred to as {\em Random-MCS}, for finding a random instance of Maximal Common Subsequence (MCSMCS) of multiple strings. A common subsequence is {\em maximal} if inserting any character into the subsequence no longer yields a common subsequence. A special case of MCS is LCS where the length is the longest. We show the complexity of our algorithm is linear in LL, and therefore is suitable for large LL. Furthermore, we study the occurrence probability for a single instance of MCS and demonstrate via both theoretical and experimental studies that the longest subsequence from multiple runs of {\em Random-MCS} often yields a solution to LCSLCS.

View on arXiv
Comments on this paper