5. Piecewise testable languages

Mikołaj Bojańczyk

zajęcia / courses

The Schützenberger theorem shows which languages are definable in first-order logic. What about less expressive logics? One direction is to study formulas where the quantification pattern is somehow limited. This idea has been seen a lot of work, of which we only describe here the simplest quantification patterns.

Existential formulas. As a warmup, we will talk about existential formulas. Define an existential, also known as $\exists^*$ formula, to be a first-order formula of the form

$\exists x_1 \exists x_2 \cdots \exists x_n \varphi(x_1,\ldots,x_n)$

where $\varphi$ is quantifier free and uses predicates as in the Schützenberger theorem, namely unary predicates for the labels and a binary predicate for the order. For example, the existential formula

$\exists x \exists y \ a(x) \land b(y) \land x < y$

defines the words where some $a$ is followed by some $b$ .

When talking about full first-order logic, the successor predicate does not need to be added to the vocabulary, because it can be defined in terms of the order predicate. However, for existential formulas, this is not the case, and therefore existential formulas which allow both order and successor would have different power than existential formulas with only order. We will not talk about the successor here, although it is studied.

Define the Higman ordering on words to be the ordering where $w \le v$ holds if one can remove some letters, not necessarily in a connected way, to go from $v$ to $w$ . This is also known as the subsequence ordering. For example “ila” is a subsequence of “Mikolaj”. We will use the following result:

Lemma (Higman’s Lemma) For a fixed alphabet, the Higman ordering is a well quasi-order, which means that it is well founded and it does not have infinite antichains.

Proof. For a finite or infinite sequence of words $w_1,w_2,\ldots$ define a growth to be positions $i < j$ such that the word $w_i$ is a subsequence of $w_j$ . Define a growth-free sequence to be a sequence of words without a growth. Antichains and violations of well-foundedness are examples of growth-free sequences, and therefore to prove the lemma it suffices to show that there is no infinite growth-free sequence.

Define the radix ordering on words as follows: shorter words come before longer ones, and same length words are ordered lexicographically. To prove the lemma, suppose that an infinite growth-free sequence sequence exists. Define a sequence $w_1,w_2,\ldots$ by induction as follows. The word $w_1$ is the least word, in the radix ordering (which is well-founded, so it makes sense to talk about least words), which is the beginning of some infinite growth-free sequence. For $n > 1$ , define $w_n$ to be the least word in the radix ordering such that $w_1,\ldots,w_n$ is the beginning of some infinite growth-free sequence, in particular $w_1,\ldots,w_n$ is a finite growth-free sequence. Growth-free sequences are closed under limits, and therefore $w_1,w_2,\ldots$ is an infinite growth free sequence.

Consider the sequence $w_1,w_2,\ldots$ defined in the previous paragraph. Because the alphabet is finite, there must be some letter, call it $a$ , such that infinitely many words in the sequence, say with indexes $n_1 < n_2 < \cdots$ , begin with the letter $a$ . Define a new sequence as follows. Let $n$ be the first index such that $w_n$ begins with $a$ . Remove from the sequence all words after $w_n$ that do not begin with $a$ , and for all words that do begin with $a$ , remove the first letter. It is not difficult to show that the new sequence is also growth-free, but it contradicts the construction of $w_1,\ldots,w_n$ because we have just shortened the word $w_n$ , and therefore decreased its position in the radix ordering. $\Box$

From Higman’s Lemma, we obtain the following result.

Theorem. A language $L \subseteq A^*$ is definable by an existential formula if and only if it is upward closed with respect to the Higman ordering.

Proof. The left-to-right implication is immediate, because adding letters can only make an existential formula more true. Let us focus on the right-to-left implication. For a set of words $X$ , let us write $X \uparrow$ for its upward closure with respect to the Higman ordering. Suppose that $L$ is upward closed. Define $\min L$ to be the elements of $L$ that are minimal with respect to the Higman ordering. Clearly $\min L$ is an antichain, and therefore by the Higman Lemma it is finite. Finally, observe that

$L = (\min L)\uparrow$

The inclusion $\supseteq$ is a consequence of the assumption that $L$ is upward closed, while for the $\subseteq$ inclusion we use well-foundedness of the Higman ordering to prove that for every word $w \in L$ , there is some minimal word that is smaller than it. It is easy to see that for every finite set, its upward closure is definable by an existential formula, and therefore $L$ is definable by an existential formula. $\Box$

This completes our study of languages defined by existential formulas. Let us move to a more interesting case, namely Boolean combinations of existential formulas.

Boolean combinations of existential formulas. Boolean combinations of existential formulas, sometimes also known as piecewise testable languages, are a much more interesting class. In particular, every finite language is a boolean combination of existential formulas. For example, the language which contains only the word abaa is equal to the set of words which contain abaa as a subsequence, but which do not contain any subsequence of length 5. A beautiful result by Imre Simon characterises piecewise testable languages as follows.

Theorem (Piecewise is $\Jj$ -trival). A language is piecewise testable if and only its syntactic monoid is $\Jj$ -trivial, i.e. all $\Jj$ -classes are singletons.

A corollary of the above theorem is that one can decide if a language is piecewise testable (whatever the use of that could be), by computing the syntactic monoid, and then computing the $\Jj$ -classes.

We will not prove this theorem, but a very similar one, which uses separation. We say that two languages $L,K \subseteq A^*$ are separated by a language $M$ if $M$ contains $L$ but is disjoint with $K$ . We will prove the following result.

Theorem (Piecewise Separation). It is decidable if two regular languages can be separated by a piecewise testable one.

A corollary of the above theorem is the same as the corollary of the theorem on piecewise being $\Jj$ -trivial, namely one can decide if a language is piecewise testable. This is because a language is piecewise testable if and only if it can be separated from its complement by a piecewise testable language.

The proof of the Piecewise Separation Theorem will consist of two parts. In part 1, we will show that piecewise separability can be characterised in terms of something called zigzags. In part 2, we will show that zigzags can be characterised by a certain game.

zajęcia / courses

5. Piecewise testable languages

Leave a Reply