Faster Longest Common Extension Queries in Strings over General Alphabets

Abstract

Longest common extension queries (often called longest common prefix queries) constitute a fundamental building block in multiple string algorithms, for example computing runs and approximate pattern matching. We show that a sequence of q LCE queries for a string of size n over a general ordered alphabet can be realized in $O(q \log \log n + n \log^* n)$ time making only $O(q + n)$ symbol comparisons. Consequently, all runs in a string over a general ordered alphabets can be computed in $O(n \log \log n)$ time making $O(n)$ symbol comparisons. Our results improve upon a solution by Kosolobov (Information Processing Letters, 2016), who designed an algorithm with $O(n \log^{2/3} n)$ running time and conjectured that $O(n)$ time is possible. Our paper makes a significant progress towards resolving this conjecture. Our techniques extend to the case of general unordered alphabets, when the time increases to $O(q \log n + n \log^* n)$. The main tools are difference covers and a variant of the disjoint-sets data structure by La Poutré (SODA 1990).

Publication
CPM pp:5:1-5:13
Date