On the String Consensus Problem and the Manhattan Sequence Consensus Problem

Abstract

In the Manhattan Sequence Consensus problem (MSC problem) we are given $k$ integer sequences, each of length $l$, and we are to find an integer sequence $x$ of length $l$ (called a consensus sequence), such that the maximum Manhattan distance of $x$ from each of the input sequences is minimized. For binary sequences Manhattan distance coincides with Hamming distance, hence in this case the string consensus problem (also called string center problem or closest string problem) is a special case of MSC. Our main result is a practically efficient $O(l)$-time algorithm solving MSC for $k\le5$ sequences. Practicality of our algorithms has been verified experimentally. It improves upon the quadratic algorithm by Amir et al. (SPIRE 2012) for string consensus problem for $k=5$ binary strings. Similarly as in Amir’s algorithm we use a column-based framework. We replace the implied general integer linear programming by its easy special cases, due to combinatorial properties of the MSC for $k\le5$. We also show that for a general parameter k any instance can be reduced in linear time to a kernel of size $k!$, so the problem is fixed-parameter tractable. Nevertheless, for $k\ge4$ this is still too much for any naive solution to be feasible in practice.

Publication
SPIRE pp:244-255
Date