Efficient counting of square substrings in a tree


We give an algorithm which in $O(n\log^2 n)$ time counts all distinct squares in a labeled tree. There are two main obstacles to overcome. The first one is that the number of distinct squares in a tree is $\Omega(n^{4/3})$ (see Crochemore et al., 2012), which differs substantially from the case of classical strings for which there are only linearly many distinct squares. We overcome this obstacle by using a compact representation of all squares (based on maximal cyclic shifts) which requires only $O(n\log n)$ space. The second obstacle is lack of adequate algorithmic tools for labeled trees, consequently we design several novel tools, this is the most complex part of the paper. In particular we extend to trees Imre Simon's compact representations of the failure table in pattern matching machines.

Theoretical Computer Science 544:60-73
Journal (see conference version)