Mikołaj Bojańczyk

## 11. Profinite words

Define the size of a regular language , denoted by ,  to be the smallest size of a monoid which recognises it, i.e. the smallest size of such that there is some monoid homomorphism which recognises .

Define the distance between two words to be the size of the smallest language which separates them, i.e. which contains one but not the other. Note that the size of a language is the same as the size of its complement, so it does not matter whether the separating language contains but not , or the other way round. This is easily seen to be a distance, in fact it is an ultrametric: if we have a language that separates and , and a language that separates and , then one of these two languages must separate and , and therefore Define a Cauchy sequence of words to be sequence of words such that for every , there is some tail of the sequence where all pairs of words are at distance at most . By unraveling the definitions, this means that for every regular language, there is a tail of the sequence where all words are in the language, or all words are outside the language. Define the profile of a Cauchy sequence to be the set of those regular languages for which there is some tail where all words belong to . Define two Cauchy sequences to be equivalent if mixing them in some (equivalently, any) way yields a Cauchy sequence. An alternative but equivalent definition is that two Cauchy sequences are equivalent if they have the same profiles.

Definition 1. profinite word over alphabet is a Cauchy sequence modulo equivalence.

Every normal word can be viewed as a profinite word, namely the constant Cauchy sequence. Observe that if a Cauchy sequence contains some word infinitely often, then it must be (ultimately) constant. This is because the singleton language is regular, and therefore some tail of the sequence must be contained in this singleton or be disjoint with it. Therefore, profinite words are either normal words, or are Cauchy sequences whose lengths that tend to infinity.

Example. Here is an example of a non-constant Cauchy sequence, i.e. of a profinite word that is not a normal word. If the sequence is to be Cauchy, some tail of the sequence must give the same answer to the regular language “words of even length”. In particular, the sequence where the -th word violates the Cauchy condition, because every tail contains words of both even and odd lengths. This argument can be applied also to lengths divisible by three, four, etc. This motivates considering the sequence where the -th word is . One can show that this sequence is indeed Cauchy, and actually it would remain Cauchy if would be replaced by some fixed word. Lemma. Consider a family of regular languages over a common alphabet . Then this family is the profile of some profinite word (i.e. some Cauchy sequence) if and only if it has the following properties:
a) all languages in the family are nonempty;
b) the family is closed under intersection;
c) for every regular language , the family contains either or its complement, but not both.

Proof.  (In the lecture, condition a) was replaced by a’), which said that the family is upward closed, but in the presence of the other two conditions, a) and a’) are equivalent.) It is easy to check that the profile of a Cauchy sequence satisfies all three conditions a), b) and c). Let us then prove that if a family satisfies these three conditions, then it is the profile of some Cauchy sequence. For a natural number , consider the intersection of all languages that belong to and which have size at most . By condition b), this intersection also belongs to , and by condition a) it must contain some word, call it . We claim that the sequence thus defined is Cauchy. We need to show that for every regular language , some tail of this sequence is contained in either or its complement. Let be the size of . If belongs to , then the tail is contained in . Otherwise, by condition c), the complement of must belong to , and since the complement has the same size, it follows that the tail is contained in the complement of . The above lemma shows that mapping a Cauchy sequence to its profile is a one-to-one correspondence between Cauchy sequences and families of regular languages that satisfy the three conditions in the lemma. This motivates an alternative definition of profinite words: a profinite word is a family of regular languages satisfying the three conditions in the lemma.

The rest of the lecture about profinite words has two parts, given in subpages of this page: