Mikołaj Bojańczyk

## Data words

July 14, 2015
1. Mikołaj Bojańczyk, Claire David, Anca Muscholl, Thomas Schwentick, Luc Segoufin
Two-variable logic on data words. ACM Trans. Comput. Log., 2011   PDF

2. Henrik Björklund, Mikołaj Bojańczyk
Shuffle Expressions and Words with Nested Data. MFCS, 2007   PDF

3. Mikołaj Bojańczyk
Automata for Data Words and Data Trees. RTA, 2010   PDF

A data word is like a word over an infinite alphabet. (A related topic is data trees, discussed here.) The definition used in the papers above is that each position contains a label from a finite set, plus a data value from an infinite set. The data values can only be compared for equality, so a typical property of words is: “there exist two positions which have the same data value and both have label “. (A more general notion of data word is obtained by using atoms, discussed here.) These papers study the automata-logic connection for data words. The problem is that this connection is not so robust as in the case of finite alphabets, and there are lots of different models which achieve different trade-offs between expressivity and decidability.  The first two papers are about two variable first-order logic, and the third one is a (slightly dated) survey of the topic.

The paper  in the list above is a journal version of

• Mikołaj Bojańczyk, Anca Muscholl, Thomas Schwentick, Luc Segoufin, Claire David
Two-Variable Logic on Words with Data. LICS, 2006   PDF

It shows that over data words, satisfiability is decidable for two-variable first-order logic with predicates for successor, order, labels, and equal data value. The paper introduces an automaton model, called data automata, which subsumes the logic and has decidable emptiness, by a reduction to reachability in vector addition systems (there is also a reduction from vector addition systems to the logic, and therefore there is little hope for an efficient algorithm, given the current hopeless state of reachability for vector additions systems).

The paper  continues the study of first-order logic with two variables. Here we study the variant where the order predicate is disallowed (but successor is allowed, along with the label and equal data value predicates). In exchange, the logic allows several data values per position. It is shown that the problem is undecidable, but becomes decidable when an additional nesting structure is imposed. The proof uses a lot of Presburger arithmetic.