Order-preserving indexing

M. Crochemore, C. S. Iliopoulos, T. Kociumaka, M. Kubica, A. Langiu, S. P. Pissis, J. Radoszewski, W. Rytter, T. Waleń

June, 2016

Abstract

Kubica et al. [33] and Kim et al. [29] introduced order-preserving pattern matching: for a given text the goal is to find its factors having the same shape as a given pattern. Known results include a linear-time algorithm for this problem (in case of polynomially-bounded alphabet) and a generalization to multiple patterns. We propose an index that enables order-preserving pattern matching queries in time proportional to pattern length. The index can be constructed in $O (n \log \log n)$ expected time or in $O (n \log 2 \log n / \log \log \log n)$ worst-case time. It is an incomplete order-preserving suffix tree which may miss a single edge label at each branching node. For most applications such incomplete suffix trees provide the same functional power as the complete ones. We show a number of their applications, including computation of longest common factors, longest previously occurring factors and squares in a string in the order-preserving setting. We also give an $O (n \sqrt{\log n})$ -time algorithm constructing complete order-preserving suffix trees.

Publication

Theoretical Computer Science 638:122-135

Date

June, 2016

Links

Publisher version BibTeX