Papers

Mikołaj Bojańczyk

March 15, 2021

“Languages recognised by finite semigroups…” book

Here you can find the current version of a book that I am writing about algebraic language theory, which is called “Languages recognised by finite semigroups and their generalisations to objects such as trees and graphs, with an emphasis on definability in monadic second-order logic”. The book begins with some classical theory of finite semigroups, such as Green’s relations, but it is mainly devoted to generalisations beyond finite words (trees, graphs, or infinite words), which uses the methodology of monads and their Eilenberg-Moore algebras.

April 3, 2019

MSO with probability

Mikołaj Bojańczyk
Thin MSO with a Probabilistic Path Quantifier. ICALP, 2016 PDF
Mikołaj Bojańczyk, Hugo Gimbert, Edon Kelmendi
Emptiness of Zero Automata Is Decidable. ICALP, 2017 PDF
Mikołaj Bojańczyk, Edon Kelmendi, Michal Skrzypczak
MSO+∇ is undecidable. LICS, 2019 PDF

Consider a probabilistic extension of MSO over infinite binary trees, which allows formulas saying “almost surely a branch $\pi \in \set{0,1}^\omega$ satisfies $\varphi(\pi)$ ”. This logic was introduced by Mio and Michalewski, and it generalises known (qualitative) probabilistic logics such as probabilistic CTL or probabilistic CTL*. Paper [3] shows that this logic is undecidable. On the other hand, the weak fragment of the logic (where the usual set quantifiers are restricted to range over finite sets) is decidable, which follows from papers [1] and [2]. Paper [1] shows that the weak fragment of the probabilistic logic (and a bit more) can be compiled into an automaton model, while paper [2] shows that emptiness for the automaton model is decidable.

October 22, 2018

Polyregular functions

Mikołaj Bojańczyk
Polyregular Functions. CoRR, 2018 PDF
Mikołaj Bojańczyk, Sandra Kiefer, Nathan Lhote
String-to-String Interpretations With Polynomial-Size Output. ICALP, 2019 PDF

Paper [1] (see also these slides) discusses a class of string-to-string functions, which goes beyond rational or regular functions, but still shares many of their good properties (e.g. the inverse image of a regular language is regular). This class is called the polyregular functions. Unlike the regular string-to-string functions (and their subclasses such as sequential or rational), which have linear size increase, the polyregular functions have polynomial size increase. The main selling point of the polyregular functions is that they admit numerous equivalent descriptions:

automata with pebbles (this definition comes from here and here);
a fragment of lambda calculus (this definition, as well as the other two below, is new to the author’s best knowledge);
a fragment of for-programs;
compositions of certain atomic functions, such as reverse or a form of string squaring.

Paper [2] adds another equivalent description to the list above, namely

5. mso interpretations (which are like mso transductions, except that one output position is interpreted in a k-tuple of input positions.

November 30, 2017

Automata Toolbox book

This is the current of “An Automata Toolbox”

It is a book of lecture notes about automata theory, based on a course at the University of Warsaw. The main goal is to give an intuitive – and visual as often as possible – understanding of important constructions in automata theory.

The current version is an alpha version, and has not changed much for the last 5 years. I do plan eventually to update it, but this may be delayed a long time. In the meantime, you can find the sources on github, and fix anything on your own. If you fix anything, don’t forget to add yourself to the acknowledgements, and let me know if you have doubts about or questions about your changes.

Before github, there was a page with mistakes, but I did not do a good job with integrating the corrections into the text.

November 14, 2016

Atom book

This is the current version of a short book about atoms that I am writing. The book begins with a discussion of automata models for infinite alphabets. Then, inspired by these automata, the book develops the theory of orbit-finite sets with atoms and computation on them.

The book was in hiatus for 6 years, but I returned to it in 2025, while giving a course about the topic. The new version has some new material (vector spaces) and deletes some of the automata material from the previous version.

July 6, 2016

MSO on graphs

Mikołaj Bojańczyk, Michal Pilipczuk
Definability equals recognizability for graphs of bounded treewidth. LICS, 2016 PDF
Mikołaj Bojańczyk, Michal Pilipczuk
Optimizing Tree Decompositions in MSO. STACS, 2017 PDF
Mikołaj Bojańczyk
Two monads for graphs. CoRR, 2018 PDF

In paper [1] (see also the arxiv version and these slides) we show that for every $k$ there is an MSO transduction, which inputs a graph of tree width at most $k$ , and outputs a tree decomposition of width at most $f(k)$ . The transduction is nondeterministic, which means it might output several possible tree decompositions, but at least one. A corollary of this result is that a conjecture of Courcelle is true: for languages of graphs of bounded tree width, definability in MSO is equivalent to recognisability.

In paper [2], we show that for every $k$ there is an MSO transduction which inputs a width $k$ tree decomposition and outputs an optimal width tree decomposition of the same graph. Combining these two results, we see that for every $k$ there is an MSO transduction which maps graphs of treewidth $k$ to their optimal width tree decompositions.

Paper [3] uses the terminology of monads to describe recognisable languages of graphs. This is meant as an alternative description of Courcelle’s VR and HR recognisable graph classes. The paper also has a new result, namely it is decidable if an CMSO definable languages of graphs (of bounded treewidth) can be defined in MSO (i.e. it is decidable if the counting, which is the C in CMSO, is necessary).

July 14, 2015

XML algorithms

Mikołaj Bojańczyk, Pawel Parys
XPath evaluation in linear time. J. ACM, 2011 PDF
Mikołaj Bojańczyk, Pawel Parys
Efficient Evaluation of Nondeterministic Automata Using Factorization Forests. ICALP (1), 2010 PDF
Mikołaj Bojańczyk, Diego Figueira
Efficient evaluation for a temporal logic on changing XML documents. PODS PDF

This is a group of algorithms that work with XML documents in linear or almost linear time. This is in contrast with the XML motivated algorithms on data words and data trees, where typically the running time is exponential or much worse. The first paper shows that XPath can be evaluated in linear time (improving on previous quadratic algorithms), while the second paper tries to explain this phenomenon using automata theory. The last paper considers evolving XML documents. It describes a logic that can express properties like “if a node in the document gets label $a$ at some point in time, then at later points in time all of its descendants will have label $b$ “. The main contribution is an algorithm that runs in time $c \cdot n \cdot \log n$ where $n$ is the number of edits, and $c$ is a constant that depends (unfortunately, in a nonelementary way) on the formula.

July 14, 2015

Algebra for infinite trees

Mikołaj Bojańczyk, Tomasz Idziaszek
Algebra for Infinite Forests with an Application to the Temporal Logic EF. CONCUR, 2009 PDF
Mikołaj Bojańczyk, Filippo Cavallari, Thomas Place, Michal Skrzypczak
Regular tree languages in low levels of the Wadge Hierarchy. Log. Methods Comput. Sci., 2019 PDF
Tomasz Idziaszek, Michal Skrzypczak, Mikołaj Bojańczyk
Regular Languages of Thin Trees. Theory Comput. Syst., 2016 PDF
Mikołaj Bojańczyk, Bartek Klin
A non-regular language of infinite trees that is recognizable by a sort-wise finite algebra. Log. Methods Comput. Sci., 2019 PDF

There are algebras for finite words (monoids or semigroups), infinite words (e.g. ω-semigroups or Wilke semigroups) and finite trees (e.g. clones or forest algebras). What about infinite trees? In [1] we propose a notion of algebra for infinite trees, which is an extension of forest algebra. It is not completely satisfactory (there is no result that finite algebras can be represented in a finite way), but it is good enough to characterize some logics, e.g. the temporal logic EF. Paper [2], which is the journal version of an ICALP 2012 paper, continues the study of algebras for infinite trees, by giving a structural characterization of some random logic (namely, Boolean combinations of sets that are open in the Borel topology). The idea is to learn what kind of structural theory of regular tree languages is needed to do nontrivial characterizations. Paper [3], which is the journal version of a STACS 2013 paper, shows that most of the technical problems go away when considering thin trees, which are infinite trees with countably many branches. Paper [4] gives a counterexample of sorts – it shows a language of infinite trees which is not regular, but where a natural Myhill-Nerode equivalence has finite index.

July 14, 2015

Algebra for finite trees

Mikołaj Bojańczyk, Igor Walukiewicz
Forest algebras. Logic and Automata, 2008 PDF
Mikołaj Bojańczyk
Two-Way Unary Temporal Logic over Trees Log. Methods Comput. Sci., 2009 PDF
Mikołaj Bojańczyk, Luc Segoufin, Howard Straubing
Piecewise testable tree languages Log. Methods Comput. Sci., 2012 PDF
Mikołaj Bojańczyk, Luc Segoufin
Tree Languages Defined in First-Order Logic with One Quantifier Alternation Log. Methods Comput. Sci., 2010 PDF
Mikołaj Bojańczyk, Howard Straubing, Igor Walukiewicz
Wreath Products of Forest Algebras, with Applications to Tree Logics Log. Methods Comput. Sci., 2012 PDF

The main motivation for this group of papers is the following decision problem, whose decidability remains open: decide if a given regular language of finite trees can be defined by a formula of first-order logic that uses predicates for the labels and the descendant relation. In other words, the question is about the tree version of the Schützenberger theorem. Since Schützenberger was so successful with monoids, we thought it would be a good idea to introduce something like monoids for trees, which is the topic of paper [1] that introduces forest algebra. Unfortunately, we were unable to use these monoids to characterize all of first-order logics, but at least some fragments can be described in terms of structural properties of the underlying forest algebras. A survey of algebras for trees can be found here, it is supposed to appear in an upcoming but much-delayed Handbook of Automata Theory.

July 14, 2015

Atoms

Mikołaj Bojańczyk
Nominal Monoids. Theory Comput. Syst., 2013 PDF
Mikołaj Bojańczyk, Bartek Klin, Slawomir Lasota 0001
Automata theory in nominal sets. Log. Methods Comput. Sci., 2014 PDF
Mikołaj Bojańczyk, Slawomir Lasota 0001
A Machine-Independent Characterization of Timed Languages. ICALP (2), 2012 PDF
Mikołaj Bojańczyk, Laurent Braud, Bartek Klin, Slawomir Lasota 0001
Towards nominal computation. POPL, 2012 PDF
Mikołaj Bojańczyk, Szymon Torunczyk
Imperative Programming in Sets with Atoms. FSTTCS, 2012 PDF
Mikołaj Bojańczyk, Thomas Place
Toward Model Theory with Data Values. ICALP (2), 2012 PDF
Mikołaj Bojańczyk, Luc Segoufin, Szymon Torunczyk
Verification of database-driven systems via amalgamation. PODS, 2013 PDF
Mikołaj Bojańczyk
Modelling Infinite Structures with Atoms. WoLLIC, 2013 PDF
Mikołaj Bojańczyk, Bartek Klin, Slawomir Lasota 0001, Szymon Torunczyk
Turing Machines with Atoms. LICS, 2013 PDF

Atoms are a theory project that we are developing in Warsaw, which even has its blog and a book. The project was originally motivated by trying to understand automata on data words and data trees. The idea is to consider sets with ur-elements that are called atoms, and to consider a different notion of finiteness: a set is called orbit-finite if it has finitely many elements up to permutations of the atoms.

July 14, 2015

Data trees

Mikołaj Bojańczyk, Anca Muscholl, Thomas Schwentick, Luc Segoufin
Two-variable logic on data trees and XML reasoning. J. ACM, 2009 PDF
Henrik Björklund, Mikołaj Bojańczyk
Bounded Depth Data Trees. ICALP, 2007 PDF
Mikołaj Bojańczyk, Slawomir Lasota 0001
An extension of data automata that captures XPath Log. Methods Comput. Sci., 2012 PDF
Vince Bárány, Mikołaj Bojańczyk, Diego Figueira, Pawel Parys
Decidable classes of documents for XPath. FSTTCS, 2012 PDF
Mikołaj Bojańczyk, Filip Murlak, Adam Witkowski
Containment of Monadic Datalog Programs via Bounded Clique-Width. ICALP (2), 2015 PDF

These are papers about data trees, which are the tree version of data words, i.e. these are trees where each node has a label from a finite alphabet and a data value from an infinite alphabet, with the provision that data values can only be compared for equality. The motivation behind data trees is that they are supposed to be like XML documents. This is a sequence of papers which tries to tackle the satisfiability problem for various logics over data trees. All of the papers use automata that run over data trees.

July 14, 2015

Data words

Mikołaj Bojańczyk, Claire David, Anca Muscholl, Thomas Schwentick, Luc Segoufin
Two-variable logic on data words. ACM Trans. Comput. Log., 2011 PDF
Henrik Björklund, Mikołaj Bojańczyk
Shuffle Expressions and Words with Nested Data. MFCS, 2007 PDF
Mikołaj Bojańczyk
Automata for Data Words and Data Trees. RTA, 2010 PDF

A data word is like a word over an infinite alphabet. (A related topic is data trees, discussed here.) The definition used in the papers above is that each position contains a label from a finite set, plus a data value from an infinite set. The data values can only be compared for equality, so a typical property of words is: “there exist two positions which have the same data value and both have label $a$ “. (A more general notion of data word is obtained by using atoms, discussed here.) These papers study the automata-logic connection for data words. The problem is that this connection is not so robust as in the case of finite alphabets, and there are lots of different models which achieve different trade-offs between expressivity and decidability. The first two papers are about two variable first-order logic, and the third one is a (slightly dated) survey of the topic.

July 14, 2015

MSO+U

Mikołaj Bojańczyk
A Bounding Quantifier. CSL, 2004 PDF
Mikołaj Bojańczyk, Thomas Colcombet
Bounds in w-Regularity. LICS, 2006 PDF
Mikołaj Bojańczyk, Pawel Parys, Szymon Torunczyk
The MSO+U Theory of (N, <) Is Undecidable. STACS, 2016 PDF
Mikołaj Bojańczyk
Weak MSO with the Unbounding Quantifier. Theory Comput. Syst., 2011 PDF
Mikołaj Bojańczyk, Szymon Torunczyk
Deterministic Automata and Extensions of Weak MSO. FSTTCS, 2009 PDF
Mikołaj Bojańczyk, Szymon Torunczyk
Weak MSO+U over infinite trees. STACS, 2012 PDF
Mikołaj Bojańczyk
Weak MSO+U with Path Quantifiers over Infinite Trees. ICALP (2), 2014 PDF

The logic MSO+U is the extension of MSO with the quantifier U, where $UX \phi(X)$ says that there are arbitrarily large finite sets $X$ which make $\phi(X)$ true. The point of the logic is to express boundedness properties like: “a graph has bounded outdegree” or “a sequence of numbers tends to infinity”. When introducing the logic [1], I thought that it would be decidable. The first signs were promising, e.g. the paper [2] showed that counter automata could be used to decide fragments of the logic. Now we know that the full logic is undecidable [3], but the problems seem to come from quantifying over infinite sets, and weak fragments are decidable [4,5,6,7]. Here are some slides.

May 17, 2015

Tree-walking automata

Mikołaj Bojańczyk, Thomas Colcombet
Tree-walking automata cannot be determinized. Theor. Comput. Sci., 2006 PDF
Mikołaj Bojańczyk, Thomas Colcombet
Tree-Walking Automata Do Not Recognize All Regular Languages. SIAM J. Comput., 2008 PDF
Mikołaj Bojańczyk
Tree-Walking Automata. LATA, 2008 PDF

A tree-walking automaton is a natural model of automaton for trees. If you would be absolutely fixed on the idea that an automaton must have a single head, then this is the model you would come up with. The idea is that the automaton has a single head, and traverses the tree by going up and down, typically doing something like a depth-first search. This is a series of papers devoted to showing that the model is not well-behaved. The first paper shows they cannot be determinised, the second that they are strictly weaker than the classical model of branching tree automata, and the third is a survey of the topic.

May 17, 2015

Finite satisfiability for two-way μ-calculus and guarded fixpoint logic

Mikołaj Bojańczyk
Two-Way Alternating Automata and Finite Models. ICALP, 2002 PDF
Vince Bárány, Mikołaj Bojańczyk
Finite satisfiability for guarded fixpoint logic. Inf. Process. Lett., 2012 PDF

The two-way modal μ-calculus is a logic which does not have the finite model property, i.e. some formulas are satisfiable but not satisfiable in any finite model. The same holds for guarded fixpoint logic, a logic introduced by Grädel and Walukiewicz. These two papers show that for both of these logics, one can decide if a given formula has a finite model. For the first paper, about the μ-calculus, the solution boils down to finding trees where certain patterns have bounded size. (Trying to understand this boundedness, I later introduced the logic MSO+U.) For the second paper, about guarded fixpoint logic, one needs to use the first paper plus results on cutting and pasting in models for guarded logics by Bárány, Gottlob and Otto.

April 9, 2015

Recognisable Languages over Monads

Mikołaj Bojańczyk
Recognisable languages over monads. CoRR, 2015 PDF

There are regular languages for: finite words, finite trees, infinite words, various kinds of infinite trees, queries, tuples of words, tuples of trees, etc etc. This paper shows that monads, a concept from category theory, can be used to unify some of the theory of “regularity”. The idea is that for each setting, like finite words or labelled countable total orders, or infinite trees with countably many branches, one chooses a monad, and then many definitions and some theorems come out for free. Here are some slides.

April 9, 2015

Transducers with origin information

Mikołaj Bojańczyk
Transducers with Origin Information. ICALP (2), 2014 PDF
Mikołaj Bojańczyk, Laure Daviaud, Bruno Guillon, Vincent Penelle
Which Classes of Origin Graphs Are Generated by Transducers. ICALP, 2017 PDF

The idea proposed in [1] is to change the semantics of a transducer, by adding information about which positions in the output originate from which positions in the input. Under such semantics, called origin semantics, fewer transducers are equivalent to each other, e.g. the reverse and identity transformations over a one letter alphabet are not equivalent under origin semantics, but they are equivalent under standard semantics. The paper shows that with origin semantics, many technical and combinatorial problems go away. There is also an implicit and debatable philosophical point, which is that origin semantics better reflects the “true essence” of a transducer, because when thinking of a transducer one also typically has in mind some origin information. Here are some slides.

Paper [2] continues the work on origin semantics. The origin semantics of a transduction can be seen as a set of graphs, which consist of an input string, an output string, and arrows from positions of the output to positions in the input which describe the origin. In [2], we study when a set of such graphs (“origin graphs”) arises from regular transductions. We identify such sets in terms of structural conditions, such as bounded treewidth or bounded outdegree for origins.