Linear-Time Algorithm for Long LCF with k Mismatches

Abstract

In the Longest Common Factor with k Mismatches ($LCF_k$) problem, we are given two strings X and Y of total length n, and we are asked to find a pair of maximal-length factors, one of X and the other of Y, such that their Hamming distance is at most k. Thankachan et al. [Thankachan et al. 2016] show that this problem can be solved in $O(n \log^k n)$ time and $O(n)$ space for constant k. We consider the $LCF_k(l)$ problem in which we assume that the sought factors have length at least l. We use difference covers to reduce the $LCF_k(l)$ problem with $l=\Omega(\log^{2k+2}n)$ to a task involving $m=O(n/\log^{k+1}n)$ synchronized factors. The latter can be solved in $O(m \log^{k+1}m)$ time, which results in a linear-time algorithm for $LCF_k(l)$ with $l=\Omega(\log^{2k+2}n)$. In general, our solution to the $LCF_k(l)$ problem for arbitrary $l$ takes $O(n + n \log^{k+1} n/\sqrt{l})$ time.

Publication
CPM pp:23:1-23:16
Date