On the String Consensus Problem and the Manhattan Sequence Consensus Problem


We study the Manhattan Sequence Consensus problem (MSC problem) in which we are given k integer sequences, each of length $l$, and we are to find an integer sequence x of length $l$ (called a consensus sequence) such that the maximum Manhattan distance of x from each of the input sequences is minimized. A related problem, with Hamming distance instead of Manhattan distance, is called Hamming String Consensus (HSC), also known under the names of string center problem or closest string problem. For binary sequences Manhattan distance coincides with Hamming distance, hence in this case HSC is a special case of MSC. We design a practically efficient $O(l)$-time algorithm solving MSC for $k\le 5$ sequences. It improves upon the quadratic algorithm by Amir et al. (2012) for HSC for k=5 binary strings. Similarly as in the algorithm of Amir et al., we use a column-based framework. We replace the implied general integer linear programming by its easy special cases due to combinatorial properties of MSC for $k\le 5$. Practicality of our algorithms has been verified experimentally. We also show that for a general parameter k any instance can be reduced in linear time to a kernel of size $k!$, so the problem is fixed-parameter tractable. Nevertheless, for $k\ge 4$ this is still too large for any naive solution to be feasible in practice. This is a full version of an article published at SPIRE 2014.

Journal (see conference version)