Weighted Shortest Common Supersequence Problem Revisited

Abstract

We revisit the Weighted Shortest Common Supersequence (WSCS) problem, that is, the SCS problem on weighted strings, that was introduced by Amir et al. [SPIRE 2011]. A weighted string, also known as a Position Weight Matrix (PWM), is a sequence of probability distributions over the alphabet. In the WSCS problem we are given two weighted strings $W_1$ and $W_2$ and a threshold $1/z$ on probability and are to compute the shortest (standard) string $S$ such that each of $W_1$, $W_2$ matches a subsequence of $S$ (not necessarily the same) with probability at least $1/z$. Amir et al. showed that a log-probability version of this problem is NP-complete. We present an algorithm that solves the WSCS problem for two weighted strings of length $n$ and over a constant-sized alphabet in $O(n^2\sqrt{z} \log{z})$ time. This matches conditional lower bounds since it is known that WSCS cannot be solved in $O(n^{2-\varepsilon})$ or $O^*(z^{0.5-\varepsilon})$ time unless there is a breakthrough improving upon long-standing upper bounds for fundamental NP-hard problems (CNF-SAT and Subset Sum, respectively). We also discover a fundamental difference between the WSCS problem and the Weighted Longest Common Subsequence (WLCS) problem that was introduced by Amir et al. [JDA 2010] by showing that WLCS cannot be solved in $n^{O(1)} f(z)$ time, for any function $f(z)$, unless $P=NP$.

Publication
SPIRE pp:221-238
Date