Last time I introduced a neat little question: Is it true that every infinite path in contains arbitrarily long arithmetic progressions? For example, in the following path, the squares colored in red have coordinates , which form an arithmetic progression of length 3, where every two consecutive points differ by .

As it turns out, this problem has already been solved! although not exactly in this form. But I did not know this at the time, and so before I tell you the answer, I will share with you a bit about how I started to investigate it, what were the directions of attack, and what eventually worked (read that as “how I eventually found out the right piece of literature”).

Now, how you approach this problem depends, of course, on what you think the solution is. If you think that it’s possible for paths to avoid long arithmetic progressions, a good starting place would be to try to explicitly construct such a path, perhaps by some recursive formula. On the other hand, if you think that every path must contain infinitely many progressions of increasing length, it would be advisable to pay a visit to the vast literature about one dimensional arithmetic progressions in or (i.e Szemerédi’s theorem, Green-Tao Theorem).

As for me, I thought that the answer depended on the dimension : My conjecture was that for , arithmetic progressions could not be avoided, that for they could, and for I was undecided. My intuition was that both paths and the arithmetic progressions within them are inherently linear, one dimensional objects; in fact, an arithmetic progression in a path in is just the intersection of the path with some rational-sloped line. It is generally very difficult to avoid intersecting too many lines in two dimensions, while in higher dimensions there is a lot more space for evasive maneuvers.

Based on this guts-feeling, I decided to work on first, with the aim of showing that arithmetic sequences cannot be avoided in any path. Now, once you decide that arithmetic sequences cannot be avoided in any path, you can make your life easier and try to prove a “practice theorem” which only applies for specific subsets of all paths. For example, it is much easier to work with **positive paths**, i.e paths in which the coordinates can only increase at every step. In , this corresponds to paths which only go either right or up, and never left or down (English is a funny language; those who step right up are never left down!).

A very Good Thing about is that it sits very comfortably in the computer screen, and we can generate pretty pictures like the one at the top of this post. Indeed, one of the first things I did was to write a short computer program which, for every integer , tries to find the length of the longest path that doesn’t contain arithmetic progressions of length . It then draws the path, together with one extra step, so that you can see why an arithmetic progression would be created. Hopefully, if indeed arithmetic progressions are unavoidable, they also appear quickly enough so that the program can hint at their existence (for the number of possible paths grows very large, very quickly).

Any two points constitute an arithmetic progression of length 1. So for , this is the longest path which forces a -arithmetic progression:

For , this is the unique longest path (up to inversion symmetry about the line ):

There doesn’t have to be a unique path, though, as the case suggests:

So far, if we denote by the length of longest path without an arithmetic progression of length , the program shows that , , and . What about ? Unfortunately, the program did not terminate when I plugged in ; it found a path of length over 20,000 without an arithmetic progression of length 4, and eventually crashed due to a memory error (indeed, all art imitates life). Here are the first 1250 steps of such a path:

Now for the dire dilemma: Should we be concerned with this result? One immediate response, certainly valid, is to conclude (in the weakest sense of the word) that arithmetic progressions of length 4 can be avoided, and to try and search for an explicit construction which avoids them. Another, equally valid conclusion, is to politely excuse ourselves and say that just happens to be very, very large, say in the order of millions, and that the program crashing due to memory error implies nothing because there is nothing to be implied.

Now, any Registered Practicalist will tell you that the second conclusion is farcical, and that we do not often encounter sequences that feature explosive growth of the form 1, 4, 10, 20000000. But in response, any Certified Theorist will then bring up Szemerédi’s theorem.

Szemerédi’s theorem states that for every integer , there is a tiny constant , so that if is large enough and has size at least , then contains a -term arithmetic progression. The proof of the theorem is based on the famous (among the right circles) Szemerédi’s Regularity Lemma. Roughly speaking, this lemma says that any graph can be partitioned into a finite number of equal sized parts, where the edges between any two parts behave in many ways almost as though they were random. The “almost” in the last sentence is controlled by a small real parameter . The issue with Szemerédi’s regularity lemma is that the number of partitions depends very, very badly on . Say, an exponential tower of the sort , where the number of exponentials in the tower is of order .

When my program crashed, I did not see any immediate way of applying Szemerédi’s theorem or lemma to the case of arithmetic progressions, but I could definitely imagine how a rapidly-exploding sequence such as 1, 4, 10, 2000000 could emerge from a horrendous tower of s.

The idea was to show that there exists a line with rational slope that intersects the path with positive density (showing that it intersects the path only an infinite number of times is not enough for Szemerédi’s theorem). The hope behind this was that the path would have a hard time both evading arithmetic sequences of length 4 **and** varying the slope too much so that it avoids intersecting lines. Unfortunately (or perhaps fortunately), I (together with my friend Ori, who was helping me) was unable to prove anything of this sort.

Defeated in one campaign, I decided to treacherously switch sides and try to prove that arithmetic progressions of length 4 can actually be avoided. The strongest lead I had to follow was the Thue-Morse sequence.

The Thue-Morse sequence is an infinite binary sequence which can be iteratively constructed as follows:

- Start with the one-letter string .
- Construct by simultaneously replacing each in with , and each with . In other words, if we define a
*substitution function*by and , this is the same as saying that , where is applied bitwise to the elements of . - The Thue-Morse sequence is the limit as of the finite words .

You may have to convince yourself a bit that the limit does exist, in the sense that each is actually a prefix of . This is true because maps the bit into the string , which itself starts with . The following first iterations may help to visualize this:

The Thue-Morse sequence has the following peculiar property (which is not that trivial to prove, actually): It is **cube-free**. That is, there is no string of binary digits such that the string appears anywhere in .

This type of result is a good starting place for our analysis on arithmetic progressions, because every monotone path in can be represented by an infinite word on an alphabet of size : You can always assume that the path starts at , and you can encode each step in each direction by a digit from to . When thus encoded, the Thue-Morse sequence gives the following path:

Since the path represented by the Thue-Morse sequence is cube-free, there is no series of steps which repeats itself three times in a row. This is good for us, since a series of steps which repeats itself three times in a row gives rise to an arithmetic progression of length 3. However, this does not mean that the path has no arithmetic progressions of length 3, since it might achieve the same overall difference vector but with the steps taken in different order. Indeed, as the computer program images above show, it is impossible to avoid having a progression of length 3. Still, it’s worthwhile to check whether or not the Thue-Morse path has any longer arithmetic progressions; if the computer doesn’t find any, then it would be a prime candidate for some actual theorem proving.

But alas, fate is not so generous:

So much for Thue-Morse.

The problem is that while the Thue-Morse sequence promises that no block of letters can exactly repeat itself three times (or more), it says absolutely nothing whatsoever about repeated blocks which are **permutations** of the symbols in . Since is an Abelian group, we do not care about the order of the steps in the block, but only on the number of steps in each direction in each block.

The next logical step in our endeavour therefore is to find out if a stronger form of the Thue-Morse word exists, which not only prohibits the same block from appearing 4 times in a row, but also prohibits a block or its permutations from appearing 4 times in a row. If such a word exists, then the path it represents does not have any arithmetic progression of length 4 or more.

A Clever Mathematician might be able to find such a generalized Thue-Morse word, and truly this is the place where I turned to the existing literature to see if a Clever Mathematician already did so. As it turns out, he did!

**Theorem (F. M. Dekking, 1978):** There exists a sequence on two symbols in which no four blocks occur consecutively that are permutations of each other.

Wow, that pretty much nails it, doesn’t it?

Since this post didn’t have any proofs until now, only pretty pictures, I’d like to present Dekking’s proof (but still with pretty pictures). It is short, but for me sort of comes out of nowhere (maybe because there were already similar, slightly weaker ideas floating around at the time of its writing, but I only saw the finalized form). I can convince myself that it’s true, but I’m not exactly sure why it’s true. Like the Thue-Morse sequence, Dekking uses a substitution function to iteratively generate an infinite word. The function itself is simple, and you can easily construct the final word yourself, if you want to. The proof that the final limit has no consecutive permutations of blocks, however, relies on a short, clever analysis, with several finely tailored requirements that are seemingly almost magically satisfied by his construction.

Dekking’s paper is only 5 pages long, so you can definitely read it yourself, but I will repeat the main claims anyway, and add some pictures which I think make the proof a tad more clear. As stated above, the theorem follows from a technical lemma concerning sequences generated by substitution functions . To state, we need several quick definitions:

**Symbol definitions:**Let be a set of symbols. A*block*of size is a finite word , where each . The*frequency vector*of a block is a row vector of size where the -th component is the number of times that appears in the block. The*frequency matrix*of a substitution function is the matrix whose -th row is the frequency vector of . For , we can decompose into two smaller blocks, . The block is called a*left-subblock*of .**Group definitions:**Let be an additive finite Abelian group. A subset is called*progression-free*of order if , , …, altogether implies that . A function can be extended to blocks by defining Such a function is called*injective*if for all and any collection of left-subblocks of , the equality implies that either all or .

We can now state the main lemma:

**Lemma:** Let , a substitution function on a set , a finite Abelian group, and be map, such that the following four conditions are met:

- The frequency matrix of is non-singular,
- For each , we have ,
- The set is progression free of order ,
- is -injective.

Then any sequence generated by (like how Thue-Morse was generated from ) does not have any blocks which occur consecutively that are permutations of each other. For us, this means in particular that the path represented by the sequence does not have a length arithmetic progression.

Those conditions are quite specific, aren’t they?

**Proof:** Roughly speaking, the conditions of the lemma all gang up to say that any consecutive blocks which are permutations of each other have found their way into the sequence by applying the substitution function to some other consecutive blocks which are shorter. But this cannot really be, because eventually we would need to create such blocks from single letters.

More formally: Let be the infinite string generated by :

For simplicity, we will omit the actual symbols in our graphic representation, and display as just a rectangle:

Assume by contradiction has blocks occurring consecutively which are all permutations of each other. In particular, there is such a set of blocks with the length of each block minimal, and with starting leftmost of all such blocks.

Since is generated by , it can be split up into a concatenation of -blocks (since ). These blocks do not necessarily have to have the same length. Each starts in some -block, which we call , where . The final block ends in some -block which we call . The situation looks somewhat like this:

(Actually, it may also happen that several blocks fit into the same -block, or that there is a -block so small that it fits entirely within a block . The treatment for these cases is similar to the case displayed above, and since it only overburdens the proof, I chose to omit it. For now, let’s just assume that the picture above is the correct one).

We can decompose each -block into a left-subblock and a right-subblock, so that occurs at the same place as for , and so that ends at the place as :

Let us calculate the values of . From the above image, we can see by additivity of that

and in general

By condition (2) in the statement of the lemma, for all , and since is the concatenation of and , this gives

Plugging this in the equation above, we have

Now, since the blocks are all permutations of each other, and since is Abelian, we have that

as each is just the sum of over all individual symbols in the block. We then get

implying that the with form an arithmetic sequence in ! But from condition (3) in the statement of the lemma, any arithmetic sequence with elements whose values are for some left-subblocks must have the difference element be equal to 0. Thus

By condition (4), this either means that all s are equal to each other, or that all s are equal to each other. Suppose that it’s the s that are equal (a similar proof will work for s). Now we know that our initial image was a bit off, and the blocks do need to have the same length:

Looking at the above picture, since , and since is a permutation of , we immediately get that is a permutation of . This means that is a permutation of ! The same argument actually holds for all , and so we get that the blocks are permutations of each other. But if the blocks are permutations of each other, then their frequencies vectors are the same, which means that either all are the same, or that the frequency matrix has two identical rows. The latter cannot be true, since by assumption (1) in the lemma the frequency matrix is non-singular. The former also cannot be true, since then the symbols themselves would form a sequence of blocks which are permutations of each other. If the length of is larger than 1, this is a contradiction to the minimality of , while if has length 1, this in particular means that is just a single symbol, and since appear in , this contradicts the assumption that is the leftmost minimal permutation-repeating block (since must in general blow up strings in order to generate an infinite word). Contradiction!

(Note: In the original proof, where the blocks could be directly contained in blocks , it is not the ‘s that are necessarily permutations of each other, but rather a collection of blocks which are the direct image of for some blocks . By similar reasoning as above, it turns out that the ‘s are permutations of each other, and the proof proceeds.)

Whew!

Having proved the lemma, all we need to do is find the right substitution , the right Abelian group and the right function which all satisfy the conditions (1) through (4). A simple task, easily solved by either divine inspiration or computer enumeration (I do not know which of these Dekking chose). In any case, here is the all-star league:

- The substitution function is given by
- The group is .
- The function is given by and .

All that is left to do is to check that all four conditions of the lemma apply.

- The frequency vector of is , and the frequency vector of is . The frequency matrix is therefore , which is non-singular.
- The function is 0 on blocks:
- The left-subblocks of are , , for and , , , and for, on which the function attains the values , , , , , , and , respectively. Crucially, the set as defined in the lemma does not contain the element , and is equal to . This set is a progression-free set of order : Since is prime, any -element arithmetic progression with non-zero difference in must contain the element .
- You can similarly check that is injective.

Since is progression-free of order , by the lemma the sequence generated by does not have 4 consecutive blocks which are permutations of each other. We are done!

That was quite a ride. Let me conclude it with a last remark about higher dimensions. The above theorem and computer search shows that there exists monotone paths in which manage to avoid arithmetic progressions of length (and therefore any larger lengths). What about paths in higher dimensions? Well, in the same paper and using the same lemma, Dekking proved that there are monotone paths in which avoid progressions of length . The unavoidable question, “what about monotone paths in which avoid progressions of length ?” had to wait about 15 years, but this too was solved in the affirmative by Keranen, using a substitution function which mapped the symbol into an 85-letter string!

Until next time, here are two quick puzzles.

- Regarding the original question about arithmetic progressions in paths in , is it easier to avoid arithmetic progressions in we consider all paths, and not just monotone paths?
- You can convince yourself that generating simple random walks is a lousy way of constructing paths which avoid arithmetic progressions. Still, it would be nice to get a decent bound on how many paths we need to generate until we get one that is free of the rascals. What is the probability that the simple random walk of steps in has an arithmetic progression of length ?

* I found the reference to Dekking’s paper in the Notes section of the first chapter of the book Automatic Sequences: Theory, Applications, Generalizations by Jean-Paul Allouche and Jeffrey Shallit. It seems like a nice book, with plenty of exercises, and I hope to read it one day. The generalized Thue-Morse words are called “Abelian power-free”, or “strongly non-repetitive”.