# Modal model theory

Let me introduce to you the topic of modal model theory, injecting some ideas from modal logic into the traditional subject of model theory in mathematical logic.

For example, we may consider the class of all models of some first-order theory, such as the class of all graphs, or the class of all groups, or all fields or what have you. In general, we have $\newcommand\Mod{\text{Mod}}\Mod(T)$, where $T$ is a first-order theory in some language $L$.


In this way, we enlarge the language $L$ to include these modal operators. Let $\possible(L)$ be the language obtained by closing $L$ under the modal operators and Boolean connectives; and let $L^\possible$ also close under quantification. The difference is whether a modal operator falls under the scope of a quantifier.

Recently, in a collaborative project with Wojciech Aleksander Wołoszyn, we made some progress, which I’d like to explain. (We also have many further results, concerning the potentialist validities of various natural instances of $\Mod(T)$, but those will wait for another post.)

Theorem. If models $M$ and $N$ are elementarily equivalent, that is, if they have the same theory in the language of $L$, then they also have the same theory in the modal language $\possible(L)$.

Proof. We show that whenever $M\equiv N$ in the language of $L$, then $M\models\varphi\iff N\models\varphi$ for sentences $\varphi$ in the modal language $\possible(L)$, by induction on $\varphi$.

Of course, by assumption the statement is true for sentences $\varphi$ in the base language $L$. And the property is clearly preserved by Boolean combinations. What remains is the modal case. Suppose that $M\equiv N$ and $M\models\possible\varphi$. So there is some extension model $M\subset W\models\varphi$.

Since $M\equiv N$, it follows by the Keisler-Shelah theorem that $M$ and $N$ have isomorphic ultrapowers $\prod_\mu M\cong\prod_\mu N$, for some ultrafilter $\mu$. It is easy to see that isomorphic structures satisfy exactly the same modal assertions in the class of all models of a theory. Since $M\subset W$, it follows that the ultrapower of $M$ is extended to (a copy of) the ultrapower of $W$, and so $\prod_\mu M\models\possible\varphi$, and therefore also $\prod_\mu N\models\possible\varphi$. From this, since $N$ embeds into its ultrapower $\prod_\mu N$, it follows also that $N\models\possible\varphi$, as desired. $\Box$

Corollary. If one model elementarily embeds into another $M\prec N$, in the language $L$ of these structures, then this embedding is also elementary in the language $\possible(L)$.

Proof. To say $M\prec N$ in language $L$ is the same as saying that $M\equiv N$ in the language $L_M$, where we have added constants for every element of $M$, and interpreted these constants in $N$ via the embedding. Thus, by the theorem, it follows that $M\equiv N$ in the language $\possible(L_M)$, as desired. $\Box$

For example, every model $M$ is elementarily embedding into its ultrapowers $\prod_\mu M$, in the language $\possible(L)$.

We’d like to point out next that these results do not extend to elementary equivalence in the full modal language $L^\possible$.

For a counterexample, let’s work in the class of all simple graphs, in the language with a binary predicate for the edge relation. (We’ll have no parallel edges, and no self-edges.) So the accessibility relation here is the induced subgraph relation.

Lemma. The 2-colorability of a graph is expressible in $\possible(L)$. Similarly for $k$-colorability for any finite $k$.

Proof. A graph is 2-colorable if we can partition its vertices into two sets, such that a vertex is in one set if and only if all its neighbors are in the other set. This can be effectively coded by adding two new vertices, call them red and blue, such that every node (other than red and blue) is connected to exactly one of these two points, and a vertex is connected to red if and only if all its neighbors are connected to blue, and vice versa. If the graph is $2$-colorable, then there is an extension realizing this statement, and if there is an extension realizing the statement, then (even if more than two points were added) the original graph must be $2$-colorable. $\Box$

A slightly more refined observation is that for any vertex $x$ in a graph, we can express the assertion, “the component of $x$ is $2$-colorable” by a formula in the language $\possible(L)$. We simply make the same kind of assertion, but drop the requirement that every node gets a color, and insist only that $x$ gets a color and the coloring extends from a node to any neighbor of the node, thereby ensuring the full connected component will be colored.

Theorem. There are two graphs that are elementary equivalent in the language $L$ of graph theory, and hence also in the language $\possible(L)$, but they are not elementarily equivalent in the full modal language $L^\possible$.

Proof. Let $M$ be a graph consisting of disjoint copies of a 3-cycle, a 5-cycle, a 7-cycle, and so on, with one copy of every odd-length cycle. Let $M^*$ be an ultrapower of $M$ by a nonprincipal ultrafilter.

Thus, $M^*$ will continue to have one 3-cycle, one 5-cycle, one 7-cycle and on on, for all the finite odd-length cycles, but then $M^*$ will have what it thinks are non-standard odd-length cycles, except that it cannot formulate the concept of “odd”. What it actually has are a bunch of $\mathbb{Z}$-chains.

In particular, $M^*$ thinks that there is an $x$ whose component is $2$-colorable, since a $\mathbb{Z}$-chain is $2$-colorable.

But $M$ does not think that there is an $x$ whose component is $2$-colorable, because an odd-length finite cycle is not $2$-colorable. $\Box$.

Since we used an ultrapower, the same example also shows that the corollary above does not generalize to the full modal language. That is, we have $M$ embedding elementarily into its ultrapower $M^*$, but it is not elementary in the language $L^\possible$.

Let us finally notice that the Łoś theorem for ultraproducts fails even in the weaker modal language $\possible(L)$.

Theorem. There are models $M_i$ for $i\in\mathbb{N}$ and a sentence $\varphi$ in the language of these models, such that every nonprincipal ultraproduct $\prod_\mu M_i$ satisfies $\possible\varphi$, but no $M_i$ satisfies $\possible\varphi$. .

Proof. In the class of all graphs, using the language of graph theory, let the $M_i$ be all the odd-length cycles. The ultraproduct $\prod_\mu M_i$ consists entirely of $\mathbb{Z}$-chains. In particular, the ultraproduct graph is $2$-colorable, but none of the $M_i$ are $2$-colorable. $\Box$

# The connect-infinity game!

I saw the following image on Twitter and Reddit, an image suggesting an entire class of infinitary analogues of the game Connect-Four. What fun! Let’s figure it out!

I’m not sure to whom the image or the idea is due. Please comment if you have information. (See comments below for current information.)

The rules will naturally generalize those in Connect-Four. Namely, starting from an empty board, the players take turns placing their coins into the $\omega\times 4$ grid. When a coin is placed in a column, it falls down to occupy the lowest available cell. Let us assume for now that the game proceeds for $\omega$ many moves, whether or not the board becomes full, and the goal is to make a connected sequence in a row of $\omega$ many coins of your color (you don’t have to fill the whole row, but rather a connected infinite segment of it suffices). A draw occurs when neither or both players achieve their goals.

In the $\omega\times 6$ version of the game that is shown, and indeed in the $\omega\times n$ version for any finite $n$, I claim that neither player can force a win; both players have drawing strategies.

Theorem. In the game Connect-$\omega$ on a board of size $\omega\times n$, where $n$ is finite, neither player has a winning strategy; both players have drawing strategies.

Proof. For a concrete way to see this, observe that either player can ensure that there are infinitely many of their coins on the bottom row: they simply place a coin into some far-out empty column. This blocks a win for the opponent on the bottom row. Next, observe that neither player can afford to follow the strategy of always answering those moves on top, since this would lead to a draw, with a mostly empty board. Thus, it must happen that infinitely often we are able to place a coin onto the second row. This blocks a win for the opponent on the second row. And so on. In this way, either players can achieve infinitely many of their coins on each row, thereby blocking any row as a win for their opponent. So both players have drawing strategies. $\Box$

Let me point out that on a board of size $\omega\times n$, where $n$ is odd, we can also make this conclusion by a strategy-stealing argument. Specifically, I argue first that the first player can have no winning strategy. Suppose $\sigma$ is a winning strategy for the first player on the $\omega\times n$ board, with $n$ odd, and let us describe a strategy for the second player. After the first move, the second player mentally ignores a finite left-initial segment of the playing board, which includes that first move and with a odd number of cells altogether in it (and hence an even number of empty cells remaining); the second player will now aim to win on the now-empty right-side of the board, by playing as though playing first in a new game, using strategy $\sigma$. If the first player should ever happen to play on the ignored left side of the board, then the second player can answer somewhere there (it doesn’t matter where). In this way, the second player plays with $\sigma$ as though he is the first player, and so $\sigma$ cannot be winning for the first player, since in this way the second player would win in this stolen manner.

Similarly, let us argue by strategy-stealing that the second player cannot have a winning strategy on the board $\omega\times n$ for odd finite $n$. Suppose that $\tau$ is a winning strategy for the second player on such a board. Let the first player always play at first in the left-most column. Because $n$ is odd, the second player will eventually have to play first in the second or later columns, leaving an even number of empty cells in the first column (perhaps zero). At this point, the first player can play as though he was the second player on the right-side board containing only that fresh move. If the opponent plays again to the left, then our player can also play in that region (since there were an even number of empty cells). Thus, the first player can steal the strategy $\tau$, and so it cannot be winning for the second player.

I am unsure about how to implement the strategy stealing arguments when $n$ is even. I shall give this more thought. In any case, the theorem for this case was already proved directly by the initial concrete argument, and in this sense we do not actually need the strategy stealing argument for this case.

Meanwhile, it is natural also to consider the $n\times\omega$ version of the game, which has only finitely many columns, each infinite. The players aim to get a sequence of $\omega$-many coins in a column. This is clearly impossible, as the opponent can prevent a win by always playing atop the most recent move. Thus:

Theorem. In the game Connect-$\omega$ on a board of size $n\times\omega$, where $n$ is finite, neither player has a winning strategy; both players have drawing strategies.

Perhaps the most natural variation of the game, however, occurs with a board of size $\omega\times\omega$. In this version, like the original Connect-Four, a player can win by either making a connected row of $\omega$ many coins, or a connected column or a connected diagonal of $\omega$ many coins. Note that we orient the $\omega$ size column upwards, so that there is no top cell, but rather, one plays by selecting a not-yet-filled column and then occupying the lowest available cell in that column.

Theorem. In the game Connect-$\omega$ on a board of size $\omega\times\omega$, neither player has a winning strategy. Both players have drawing strategies.

Proof. Consider the strategy-stealing arguments. If the first player has a winning strategy $\sigma$, then let us describe a strategy for the second player. After the first move, the second player will ignore finitely many columns at the left, including that first actual move, aiming to play on the empty right-side of the board as though the first player using stolen strategy $\sigma$ (but with colors swapped). This will work fine, as long as the first player also plays on that part of the board. Whenever the first player plays on the ignored left-most part, simply respond by playing atop. This prevents a win in that part of the part, and so the second player will win on the right-side by pretending to be first there. So there can be no such winning strategy $\sigma$ for the first player.

If the second player has a winning strategy $\tau$, then as before let the first player always play in the first column, until $\tau$ directs the second player to play in another column, which must eventually happen if $\tau$ is winning. At that moment, the first player can pretend to be second on the subboard omitting the first column. So $\tau$ cannot have been winning after all for the second player. $\Box$

In the analysis above, I was considering the game that proceeded in time $\omega$, with $\omega$ many moves. But such a play of the game may not actually have filled the board completely. So it is natural to consider a version of the game where the players continue to play transfinitely, if the board is not yet full.

So let us consider now the transfinite-play version of the game, where play proceeds transfinitely through the ordinals, until either the board is filled or one of the players has achieved the winning goal. Let us assume that the first player also plays first at limit stages, at $\omega$ and $\omega\cdot 2$ and $\omega^2$, and so on, if game play should happen to proceed for that long.

The concrete arguments that I gave above continue to work for the transfinite-play game on the boards of size $\omega\times n$ and $n\times\omega$.

Theorem. In the transfinite-play version of Connect-$\omega$ on boards of size $\omega\times n$ or $n\times\omega$, where $n$ is finite, neither player can have a winning strategy. Indeed, both players can force a draw while also filling the board in $\omega$ moves.

Proof. It is clear that on the $n\times\omega$ board, either player can force each column to have infinitely many coins of their color, and this fills the board, while also preventing a win for the opponent, as desired.

On the $\omega\times n$ board, consider a variation of the strategy I described above. I shall simply always play in the first available empty column, thereby placing my coin on the bottom row, until the opponent also plays in a fresh column. At that moment, I shall play atop his coin, thereby placing another coin in the second row; immediately after this, I also play subsequently in the left-most available column (so as to force the board to be filled). I then continue playing in the bottom row, until the opponent also does, which she must, and then I can add another coin to the second row and so on. By always playing the first-available second-row slot with all-empty second rows to the right, I can ensure that the opponent will eventually also make a second-row play (since otherwise I will have a winning condition on the second row), and at such a moment, I can also make a third-row play. By continuing in this way, I am able to place infinitely many coins on each row, while also forcing that the board becomes filled. $\Box$

Unfortunately, the transfinite-play game seems to break the strategy-stealing arguments, since the play is not symmetric for the players, as the first player plays first at limit stages.

Nevertheless, following some ideas of Timothy Gowers in the comments below, let me show that the second player has a drawing strategy.

Theorem. In the transfinite-play version of Connect-$\omega$ on a board of size $\omega\times\omega$, the second player has a drawing strategy.

Proof. We shall arrange that the second player will block all possible winning configurations for the first player, or to have column wins for each player. To block all row wins, the second player will arrange to occupy infinitely many cells in each row; to block all diagonal wins, the second player will aim to occupy infinitely many cells on each possible diagonal; and to block the column wins, the second player will aim either to have infinitely many cells on each column or to copy a winning column of the opponent on another column.

To achieve these things, we simply play as follows. Take the columns in successive groups of three. On the first column in each block of three, that is on the columns indexed $3m$, the second player will always answer a move by the first player on this column. In this way, the second player occupies every other cell on these columns—all at the same height. This will block all diagonal wins, because every diagonal winning configuration will need to go through such a cell.

On the remaining two columns in each group of three, columns $3m+1$ and $3m+2$, let the second player simply copy moves of the opponent on one of these columns by playing on the other. These moves will therefore be opposite colors, but at the same height. In this way, the second player ensures that he has infinitely many coins on each row, blocking the row wins. And also, this strategy ensures that in these two columns, at any limit stage, either neither player has achieved a winning configuration or both have.

Thus, we have produced a drawing strategy for the second player. $\Box$

Thus, there is no advantage to going first. What remains is to determine if the first player also has a drawing strategy, or whether the second player can actually force a win.

Gowers explains in the comments below also how to achieve such a copying mechanism to work on a diagonal, instead of just on a column.

I find it also fascinating to consider the natural generalizations of the game to larger ordinals. We may consider the game of Connect-$\alpha$ on a board of size $\kappa\times\lambda$, for any ordinals $\alpha,\kappa,\lambda$, with transfinite play, proceeding until the board is filled or the winning conditions are achieved. Clearly, there are some instances of this game where a player has a winning strategy, such as the original Connect-Four configuration, where the first player wins, and presumably many other instances.

Question. In which instances of Connect-$\alpha$ on a board of size $\kappa\times\lambda$ does one of the players have a winning strategy?

It seems to me that the groups-of-three-columns strategy described above generalizes to show that the second player has at least a drawing strategy in Connect-$\alpha$ on board $\kappa\times\lambda$, whenever $\alpha$ is infinite.

Stay tuned…

# Alan Turing, On computable numbers

I have been reading Alan Turing’s paper, On computable numbers, with an application to the entsheidungsproblem, an amazing classic, written by Turing while he was a student in Cambridge. This is the paper in which Turing introduces and defines his Turing machine concept, deriving it from a philosophical analysis of what it is that a human computer is doing when carrying out a computational task.

The paper is an incredible achievement. He accomplishes so much: he defines and explains the machines; he proves that there is a universal Turing machine; he shows that there can be no computable procedure for determining the validities of a sufficiently powerful formal proof system; he shows that the halting problem is not computably decidable; he argues that his machine concept captures our intuitive notion of computability; and he develops the theory of computable real numbers.

What I was extremely surprised to find, however, and what I want to tell you about today, is that despite the title of the article, Turing adopts an incorrect approach to the theory of computable numbers. His central definition is what is now usually regarded as a mistaken way to proceed with this concept.

Let me explain. Turing defines that a computable real number is one whose decimal (or binary) expansion can be enumerated by a finite procedure, by what we now call a Turing machine. You can see this in the very first sentence of his paper, and he elaborates on and confirms this definition in detail later on in the paper.

He subsequently develops the theory of computable functions of computable real numbers, where one considers computable functions defined on these computable numbers. The computable functions are defined not on the reals themselves, however, but on the programs that enumerate the digits of those reals. Thus, for the role they play in Turing’s theory, a computable real number is not actually regarded as a real number as such, but as a program for enumerating the digits of a real number. In other words, to have a computable real number in Turing’s theory is to have a program for enumerating the digits of a real number. And it is this aspect of Turing’s conception of computable real numbers where his approach becomes problematic.

One specific problem with Turing’s approach is that on this account, it turns out that the operations of addition and multiplication for computable real numbers are not computable operations. Of course this is not what we want.

The basic mathematical fact in play is that the digits of a sum of two real numbers $a+b$ is not a continuous function of the digits of $a$ and $b$ separately; in some cases, one cannot say with certainty the initial digits of $a+b$, knowing only finitely many digits, as many as desired, of $a$ and $b$.

To see this, consider the following sum $a+b$
\begin{align*} &0.343434343434\cdots \\ +\quad &0.656565656565\cdots \\[-7pt] &\hskip-.5cm\rule{2in}{.4pt}\\ &0.999999999999\cdots \end{align*}
If you add up the numbers digit-wise, you get $9$ in every place. That much is fine, and of course we should accept either $0.9999\cdots$ or $1.0000\cdots$ as correct answers for $a+b$ in this instance, since those are both legitimate decimal representations of the number $1$.

The problem, I claim, is that we cannot assign the digits of $a+b$ in a way that will depend only on finitely many digits each of $a$ and $b$. The basic problem is that if we inspect only finitely many digits of $a$ and $b$, then we cannot be sure whether that pattern will continue, whether there will eventually be a carry or not, and depending on how the digits proceed, the initial digits of $a+b$ can be affected.

In detail, suppose that we have committed to the idea that the initial digits of $a+b$ are $0.999$, on the basis of sufficiently many digits of $a$ and $b$. Let $a’$ and $b’$ be numbers that agree with $a$ and $b$ on those finite parts of $a$ and $b$, but afterwards have all $7$s. In this case, the sum $a’+b’$ will involve a carry, which will turn all the nines up to that point to $0$, with a leading $1$, making $a’+b’$ strictly great than $1$ and having decimal representation $1.000\cdots00005555\cdots$. Thus, the initial-digits answer $0.999$ would be wrong for $a’+b’$, even though $a’$ and $b’$ agreed with $a$ and $b$ on the sufficiently many digits supposedly justifying the $0.999$ answer. On the other hand, if we had committed ourselves to $1.000$ for $a+b$, on the basis of finite parts of $a$ and $b$ separately, then let $a”$ and $b”$ be all $2$s beyond that finite part, in which case $a”+b”$ is definitely less than $1$, making $1.000$ wrong.

Therefore, there is no algorithm to compute the digits of $a+b$ continuously from the digits of $a$ and $b$ separately. It follows that there can be no computable algorithm for computing the digits of $a+b$, given the programs that compute $a$ and $b$ separately, which is how Turing defines computable functions on the computable reals. (This consequence is a subtly different and stronger claim, but one can prove it using the Kleene recursion theorem. Namely, let $a=.343434\cdots$ and then consider the program to enumerate a number $b$, which will begin with $0.656565$ and keep repeating $65$ until it sees that the addition program has given the initial digits for $a+b$, and at this moment our program for $b$ will either switch to all $7$s or all $2$s in such a way so as to refute the result. The Kleene recursion theorem is used in order to know that indeed there is such a self-referential program enumerating $b$.)

One can make similar examples showing that multiplication and many other very simple functions are not computable, if one insists that a computable number is an algorithm enumerating the digits of the number.

So what is the right definition of computable number? Turing was right that in working with computable real numbers, we want to be working with the programs that compute them, rather than the reals themselves somehow. What is needed is a better way of saying that a given program computes a given real.

The right definition, widely used today, is that we want an algorithm not to compute exactly the digits of the number, but rather, to compute approximations to the number, as close as desired, with a known degree of accuracy. One can define a computable real number as a computable sequence of rational numbers, such that the $n^{th}$ number is within $1/2^n$ of the target number. This is equivalent to being able to compute rational intervals around the target real, of size less than any specified accuracy. And there are many other equivalent ways to do it. With this concept of computable real number, then the operations of addition, multiplication, and so on, all the familiar operations on the real numbers, will be computable.

But let me clear up a confusing point. Although I have claimed that Turing’s original definition of computable real number is incorrect, and I have explained how we usually define this concept today, the mathematical fact is that a real number $x$ has a computable presentation in Turing’s sense (we can compute the digits of $x$) if and only if it has a computable presentation in the contemporary sense (we can compute rational approximations to any specified accuracy). Thus, in terms of which real numbers we are talking about, the two approaches are extensionally the same.

Let me quickly prove this. If a real number $x$ is computable in Turing’s sense, so that we can compute the digits of $x$, then we can obviously compute rational approximations to any desired accuracy, simply by taking sufficiently many digits. And conversely, if a real number $x$ is computable in the contemporary sense, so we can compute rational approximations to any specified accuracy, then either it is itself a rational number, in which case we can certainly compute the digits of $x$, or else it is irrational, in which case for any specified digit place, we can wait until we have a rational approximation forcing it to one side or the other, and thereby come to know this digit. (Note: there are issues of intuitionistic logic occurring here, precisely because we cannot tell from the approximation algorithm itself which case we are in.) Note also that this argument works in any desired base.

So there is something of a philosophical problem here. The issue isn’t that Turing has misidentified particular reals as being computable or non-computable or has somehow got the computable reals wrong extensionally as a subset of the real numbers, since every particular real number has Turing’s kind of representation if and only if it has the approximation kind of representation. Rather, the problem is that because we want to deal with computable real numbers by working with the programs that represent them, Turing’s approach means that we cannot regard addition as a computable function on the computable reals. There is no computable procedure that when given two programs for enumerating the digits of real numbers $a$ and $b$ returns a program for enumerating the digits of the sum $a+b$. But if you use the contemporary rational-approximation representation of computable real number, then you can computably produce a program for the sum, given programs for the input reals. This is the sense in which Turing’s approach is wrong.

# The propagation of error in classical geometry constructions

I’d like to discuss the issue of error and error propagation in the constructions of classical geometry. How does error propagate in these constructions? How sensitive are the familiar classical constructions to small errors in the use of the straightedge or compass?

Let me illustrate what I have in mind by considering the classical construction of Apollonius of the perpendicular bisector of a line segment $AB$.  One forms two circles, centered at $A$ and $B$ respectively, each with radius $AB$. These arcs intersect at points $P$ and $Q$, respectively, which form the perpendicular bisector, meeting the original segment at the midpoint $C$.

That is all fine and good, and one can easily prove that indeed $PQ$ is perpendicular to $AB$ and that $C$ is the midpoint of $AB$, as desired.

When carrying out such a construction in practice, however, there will inevitably be some small errors. We do not expect to implement it exactly, with infinite precision, but rather, we expect some small errors in the placement of the compass or straightedge, and perhaps these errors may accumulate and they propagate through the construction. What I would like to discuss is the sensitivity of this construction and the other classical constructions to these small errors.

For example, suppose that we are given points $A$ and $B$. When we seek to construct the circle centered at $A$ with radius $AB$, we place the point of the compass at $A$, and this placement may have some small error deviating from $A$, landing somewhere in the blue circle. Similarly, the writing (or etching) implement end of the compass is placed at $B$, with its own possibly different error, landing the orange circle at $B$. The arc actually resulting will be some arc arising with some such small errors, like this:

We may represent the space of all arcs that could arise in conformance with those error bounds as the blurred orange arc below. This image was created simply by drawing many dozens of such arcs in orange, with various choices for the center and radius within the error circles, and blending the results together.

We carry out the same construction with similar errors for the other arc, centered at $B$ and passing through $A$. These arcs overlap in the darker orange regions above and below, determining the points $P$ and $Q$.

The actual arcs we draw and the corresponding vertical will land somewhere inside these blurred regions, perhaps like this:

Note that in this particular case, the resulting line $PQ$ is noticably non-perpendicular to $AB$, and the resulting point $C$ is noticably not the midpoint. Consider the space of all the bisectors $PQ$ that might arise in conformance with our errors on $A$ and $B$, showing the result as the vertical red shaded region.   The darker red region is the space of possible points $C$ that we might have constructed as the “midpoint” $C$, in conformance with the error estimates.

Given the size of the original error bounds on the points $A$ and $B$, it may be surprising that even such a standard simple construction as this — constructing the perpendicular bisector and midpoint of a segment — appears to have comparatively large error propagation, since the shaded red region $C$ is quite large and includes many points that one would not say are close to being the midpoint.  In this sense, the Apollonius perpendicular bisector construction appears to be sensitive to the errors of compass placement.

Is there a better construction? For example, in terms of improving the accuracy at least of the perpendicularity of $PQ$ to $AB$, it would seem to help to use a much larger circle, which would lower the variation in the resulting “right” angle.  But this is partly because we have so far assumed that compass error arises only with the placement of the points of the compass, and not during the course of actually drawing the arc. But of course, one can imagine that errors arise from a flexing of the compass during use, causing it to deviate from circularity, or from slippage, which might reasonably be expected to cause increasing error with the length or degree of the arc, and so on, and such a model of error might have greater errors with large circles.

One could in principle carry out similar analyses for any geometric construction, and use the corresponding results to compare the sensitivity of various methods for constructing the same object, as well as modeling different sources of error.  The goal might be to mount a precise analysis of all the standard constructions and compare competing constructions for accuracy.

There is a literature of papers doing precisely this, and I will try to post some references later (or please do so in the comments, if you have some good ones).

Another approach to error estimation would be to think of the errors at points $A$ and $B$ as probability distributions, centered at $A$ and $B$ and with a certain variation; and one then gets corresponding distributions for the points $P$ and $Q$, which are not rigid shapes as in my diagrams, but qualitatively similar distributions spread out in that region, and a resulting probability distribution for the point $C$.

Finally, let me mention that one might hope to improve the accuracy of a construction, simply by repeating it and averaging the result, or by some other convergence algorithm. For example, as a first step, we might simply perform the Apollonius bisector construction twice, producing midpoint candidates $C_0$ and $C_1$, and we could proceed simply find the midpoint of $C_0C_1$ as a further presumably more accurate midpoint. Or we could iterate in some other manner and hope to converge to the actual midpoint. For example, we could produce seven midpoint candidates, and take the resulting median point.

# Draw an infinite chessboard in perspective, using straightedge only

I’d like to explain to you how to draw chessboards by hand in perfect perspective, using only a straightedge.  In this post, I’ll explain how to construct chessboards of any size, starting with the size of the basic unit square.

This post follows up on the post I made yesterday about how to draw a chessboard in perspective view, using only a straightedge.  That method was a subdivision method, where one starts with the boundary of the desired board, and then subdivides to make a chessboard. Now, we start with the basic square and build up. This method is actually quite efficient for quickly making very large boards in perspective view.

I want to emphasize that this is something that you can actually do, right now. It’s fun! All you need is a piece of paper, a pencil and a straightedge. I’ll wait right here while you gather your materials. Use a ruler or a chop stick (as I did) or the edge of a notebook or the lid of a box. Sit at your table and draw a huge chessboard in perspective. You can totally do this.

Start with a horizon, having two points at infinity (orange), at left and right, and a third point midway between them (brown), which we will call the diagonal infinity. Also, mark the front corner of your chess board (blue).

Extend the front corner to the points at infinity. And then mark off (red) a point that will be a measure of the grid spacing in the chessboard. This will the be size of the front square.

You can extend that point to infinity at the right. This delimits the first rank of the chessboard.

Next, extend the front corner of the board to the diagonal infinity.

The intersection of that diagonal with the previous line determines a point, which when extended to infinity at the left, produces the first square of the chessboard.

And that line determines a new point on the leading rank edge. Extend that point up to the diagonal infinity, which determines another point on the second rank line.

Extend that line to infinity at the left, which determines another point on the leading rank edge.

Continuing in this way, one can produce as many first rank squares as desired. Go ahead and do that. At each step, you extend up to the diagonal infinity, which determines a new point, which when extended to infinity at the left determines another point, and so on.

If you should now reflect on the current diagram, you may notice that we have actually determined many further points in the grid than we have mentioned — and thanks to my daughter Hypatia for noticing this simplification — for there is a whole triangle of further intersection points between the files and the diagonals.

We can use these points (and we do not need them all) to construct the rest of the board, by drawing out the lines to infinity at the right. Thus, we construct the whole chessboard:

One can construct a perspective chessboard of any size this way, and one can simply continue with the construction and make it larger, if desired.

It will look a little better if you add a point at infinity down below (and do so directly below the diagonal point at infinity, but a good distance down below the board), and extend the board downward one level. The corresponding diagram on yesterday’s post might be helpful.

You can now color the tile pattern, and you’ll have a chessboard in perfect perspective view.

If you keep going, you can make extremely large chessboards. In time, I hope that you will come to learn how to complete an infinite chess board in finite time.

# Draw a chessboard in perspective view, using straightedge only

Let me show you how to hand-draw an image of a chessboard in perspective, using only a straightedge. There is no need to measure any distances or to make calculations of any kind.  All you need is a straightedge, paper and your pen or pencil.

Imagine the vast doodling potential!  A person who happens to be stuck in a lecture on some other less interesting topic could easily make one or more elaborate chessboards in perfect perspective, using only a straightedge! Please carry out the construction and then share your resulting images. I would love to see them.

To begin, give yourself a horizon at the top of the page, with two points at infinity (in orange), and also two points (blue) that will become the front and back corner of your chessboard.  You can play around with different arrangements of these points, which will lead ultimately to different perspective views.

Using your straightedge, draw the lines from those reference points to the points at infinity. The resulting enclosed region will ultimately become the main chess board. One can alternatively think of starting with the front edges of the desired board, and then setting a horizon and using that to determine the points at infinity, and then adding the back edges. If you like, you can add a third point at infinity down below, and a front bottom reference point (blue), to give the board some thickness. Construct the lines to the bottom point at infinity.

The result is now the main outline of your chessboard as a rectangular solid.

Next, construct the center point of the top face, by drawing the two diagonals and finding the intersection point. I call this the center point, because it is the point that represents in the perspective view the actual center point of the chessboard, even though this point is not in the “center” of the quadrilateral representing the board. It is remarkable that one can find that point without needing to measure or calculate anything, simply because the two straight lines of the diagonal of the chessboard intersect at that point, and this remains true in the perspective view. This is the key idea that enables the entire construction method to proceed with only a straightedge.

Construct the midpoint lines by drawing the lines from that centerpoint to the points at infinity.

Now you have the chessboard with the main midlines drawn.

Construct the center points of the two diagonal squares, by intersecting the diagonals of each of them.

Construct the lines that extend those points to the points at infinity.

Thus, you have constructed the main 4 x 4 grid.

You can extend the grid lines down to the bottom point at infinity like so:

One can stop with the 4 x 4 board, if desired. Simply add suitable shading, and you’ve completed the 4 x 4 chessboard in perspective.

Alternatively, one can continue with one more iteration to construct the 8 x 8 board. From the 4 x 4 grid (with no shading yet) simply construct the center points of the squares on the main diagonal, and extend those lines to infinity.  This will enable you to draw the 8 x 8 grid lines, and after shading, you’ll have the complete chessboard.

The initial arrangement of points affects the nature of the perspective view. Having the points at infinity very far away will produce something closer to orthoprojection; having them close produces a more extreme perspective, which simulates a view from a vantage point extremely close to the board.

Please give the construction a try! All you need is paper, pencil and a straightedge! Provide links below in the comments to photos of your creations.

Meanwhile, let me point you towards my follow post, How to draw infinite chessboards by hand in perfect perspective, using only a straightedge. The difference between the methods is that the method of this post is about subdividing a given board, and the other method is about generating arbitrarily large chessboards from a given unit square.

I learned these construction method years ago from my CUNY geometer colleague Ilya Kofman.

# The axiom of well-ordered replacement is equivalent to full replacement over Zermelo + foundation

In recent work, Alfredo Roque Freire and I have realized that the axiom of well-ordered replacement is equivalent to the full replacement axiom, over the Zermelo set theory with foundation.

The well-ordered replacement axiom is the scheme asserting that if $I$ is well-ordered and every $i\in I$ has unique $y_i$ satisfying a property $\phi(i,y_i)$, then $\{y_i\mid i\in I\}$ is a set. In other words, the image of a well-ordered set under a first-order definable class function is a set.

Alfredo had introduced the theory Zermelo + foundation + well-ordered replacement, because he had noticed that it was this fragment of ZF that sufficed for an argument we were mounting in a joint project on bi-interpretation. At first, I had found the well-ordered replacement theory a bit awkward, because one can only apply the replacement axiom with well-orderable sets, and without the axiom of choice, it seemed that there were not enough of these to make ordinary set-theoretic arguments possible.

But now we know that in fact, the theory is equivalent to ZF.

Theorem. The axiom of well-ordered replacement is equivalent to full replacement over Zermelo set theory with foundation.

$$\text{ZF}\qquad = \qquad\text{Z} + \text{foundation} + \text{well-ordered replacement}$$

Proof. Assume Zermelo set theory with foundation and well-ordered replacement.

Well-ordered replacement is sufficient to prove that transfinite recursion along any well-order works as expected. One proves that every initial segment of the order admits a unique partial solution of the recursion up to that length, using well-ordered replacement to put them together at limits and overall.

Applying this, it follows that every set has a transitive closure, by iteratively defining $\cup^n x$ and taking the union. And once one has transitive closures, it follows that the foundation axiom can be taken either as the axiom of regularity or as the $\in$-induction scheme, since for any property $\phi$, if there is a set $x$ with $\neg\phi(x)$, then let $A$ be the set of elements $a$ in the transitive closure of $\{x\}$ with $\neg\phi(a)$; an $\in$-minimal element of $A$ is a set $a$ with $\neg\phi(a)$, but $\phi(b)$ for all $b\in a$.

Another application of transfinite recursion shows that the $V_\alpha$ hierarchy exists. Further, we claim that every set $x$ appears in the $V_\alpha$ hierarchy. This is not immediate and requires careful proof. We shall argue by $\in$-induction using foundation. Assume that every element $y\in x$ appears in some $V_\alpha$. Let $\alpha_y$ be least with $y\in V_{\alpha_y}$. The problem is that if $x$ is not well-orderable, we cannot seem to collect these various $\alpha_y$ into a set. Perhaps they are unbounded in the ordinals? No, they are not, by the following argument. Define an equivalence relation $y\sim y’$ iff $\alpha_y=\alpha_{y’}$. It follows that the quotient $x/\sim$ is well-orderable, and thus we can apply well-ordered replacement in order to know that $\{\alpha_y\mid y\in x\}$ exists as a set. The union of this set is an ordinal $\alpha$ with $x\subseteq V_\alpha$ and so $x\in V_{\alpha+1}$. So by $\in$-induction, every set appears in some $V_\alpha$.

The argument establishes the principle: for any set $x$ and any definable class function $F:x\to\text{Ord}$, the image $F\mathrel{\text{”}}x$ is a set. One proves this by defining an equivalence relation $y\sim y’\leftrightarrow F(y)=F(y’)$ and observing that $x/\sim$ is well-orderable.

We can now establish the collection axiom, using a similar idea. Suppose that $x$ is a set and every $y\in x$ has a witness $z$ with $\phi(y,z)$. Every such $z$ appears in some $V_\alpha$, and so we can map each $y\in x$ to the smallest $\alpha_y$ such that there is some $z\in V_{\alpha_y}$ with $\phi(y,z)$. By the observation of the previous paragraph, the set of $\alpha_y$ exists and so there is an ordinal $\alpha$ larger than all of them, and thus $V_\alpha$ serves as a collecting set for $x$ and $\phi$, verifying this instance of collection.

From collection and separation, we can deduce the replacement axiom $\Box$

I’ve realized that this allows me to improve an argument I had made some time ago, concerning Transfinite recursion as a fundamental principle. In that argument, I had proved that ZC + foundation + transfinite recursion is equivalent to ZFC, essentially by showing that the principle of transfinite recursion implies replacement for well-ordered sets. The new realization here is that we do not need the axiom of choice in that argument, since transfinite recursion implies well-ordered replacement, which gives us full replacement by the argument above.

Corollary. The principle of transfinite recursion is equivalent to the replacement axiom over Zermelo set theory with foundation.

$$\text{ZF}\qquad = \qquad\text{Z} + \text{foundation} + \text{transfinite recursion}$$

There is no need for the axiom of choice.

# A question in set-theoretic geology: if $M[G][K]=M[H][K],$ then can we conlude $M[G]=M[H]$?

I was recently asked this interesting question on set-theoretic geology by Iian Smythe, a set-theory post-doc at Rutgers University; the problem arose in the context of one of this current research projects.

Question. Assume that two product forcing extensions are the same $$M[G][K]=M[H][K],$$ where $M[G]$ and $M[H]$ are forcing extensions of $M$ by the same forcing notion $\mathbb{P}$, and $K\subset\mathbb{Q}\in M$ is both $M[G]$ and $M[H]$-generic with respect to this further forcing $\mathbb{Q}$.  Can we conclude that $$M[G]=M[H]\ ?$$ Can we make this conclusion at least in the special case that $\mathbb{P}$ is adding a Cohen real and $\mathbb{Q}$ is collapsing the continuum?

It seems natural to hope for a positive answer, because we are aware of many such situations that arise with forcing, where indeed $M[G]=M[H]$. Nevertheless, the answer is negative. Indeed, we cannot legitimately make this conclusion even when both steps of forcing are adding merely a Cohen real. And such a counterexample implies that there is a counterexample of the type mentioned in the question, simply by performing further collapse forcing.

Theorem. For any countable model $M$ of set theory, there are $M$-generic Cohen reals $c$, $d$ and $e$, such that

1. The Cohen reals $c$ and $e$ are mutually generic over $M$.
2. The Cohen reals $d$ and $e$ are mutually generic over $M$.
3. These two pairs produce the same forcing extension $M[c][e]=M[d][e]$.
4. But  the intermediate models are different $M[c]\neq M[d]$.

Proof. Fix $M$, and let $c$ and $e$ be any two mutually generic Cohen reals over $M$. Let us view them as infinite binary sequences, that is, as elements of Cantor space. In the extension $M[c][e]$, let $d=c+e \mod 2$, in each coordinate. That is, we get $d$ from $c$ by flipping bits, but only on coordinates that are $1$ in $e$. This is the same as applying a bit-flipping automorphism of the forcing, which is available in $M[e]$, but not in $M$. Since $c$ is $M[e]$-generic by reversing the order of forcing, it follows that $d$ also is $M[e]$-generic, since the automorphism is in $M[e]$. Thus, $d$ and $e$ are mutually generic over $M$. Further, $M[c][e]=M[d][e]$, because $M[e][c]=M[e][d]$, as $c$ and $d$ were isomorphic generic filters by an isomorphism in $M[e]$. But finally, $M[c]$ and $M[d]$ are not the same, because from $c$ and $d$ together we can construct $e$, because we can tell exactly which bits were flipped. $\Box$

If one now follows the $e$ forcing with collapse forcing, one achieves a counterexample model of the type mentioned in the question, namely, with $M[c][e*K]=M[d][e*K]$, but $M[c]\neq M[d]$.

I have a feeling that my co-authors on a current paper in progress, Set-theoretic blockchains, on the topic of non-amalgamation in the generic multiverse, will tell me that the argument above is an instance of some of the theorems we prove in the latter part of that paper. (Miha, please tell me in the comments, if you see this, or tell me where I have seen this argument before; I think I made this argument or perhaps seen it before.) The paper is

• M. E. Habič, J. D. Hamkins, L. D. Klausner, J. Verner, and K. J. Williams, “Set-theoretic blockchains,” Archive for Mathematical Logic, 2019.
@ARTICLE{HabicHamkinsKlausnerVernerWilliams2018:Set-theoretic-blockchains,
author = {Miha E. Habič and Joel David Hamkins and Lukas Daniel Klausner and Jonathan Verner and Kameryn J. Williams},
title = {Set-theoretic blockchains},
journal="Archive for Mathematical Logic",
year="2019",
month="Mar",
day="26",
abstract="Given a countable model of set theory, we study the structure of its generic multiverse, the collection of its forcing extensions and ground models, ordered by inclusion. Mostowski showed that any finite poset embeds into the generic multiverse while preserving the nonexistence of upper bounds. We obtain several improvements of his result, using what we call the blockchain construction to build generic objects with varying degrees of mutual genericity. The method accommodates certain infinite posets, and we can realize these embeddings via a wide variety of forcing notions, while providing control over lower bounds as well. We also give a generalization to class forcing in the context of second-order set theory, and exhibit some further structure in the generic multiverse, such as the existence of exact pairs.",
issn="1432-0665",
doi="10.1007/s00153-019-00672-z",
note = {},
abstract = {},
eprint = {1808.01509},
archivePrefix = {arXiv},
primaryClass = {math.LO},
keywords = {},
source = {},
url = {http://wp.me/p5M0LV-1M8},
}

.

# A new proof of the Barwise extension theorem, without infinitary logic

I have found a new proof of the Barwise extension theorem, that wonderful yet quirky result of classical admissible set theory, which says that every countable model of set theory can be extended to a model of $\text{ZFC}+V=L$.


The Barwise extension theorem is both (i) a technical culmination of the pioneering methods of Barwise in admissible set theory and infinitary logic, including the Barwise compactness and completeness theorems and the admissible cover, but also (ii) one of those rare mathematical theorems that is saturated with significance for the philosophy of mathematics and particularly the philosophy of set theory. I discussed the theorem and its philosophical significance at length in my paper, The multiverse perspective on the axiom of constructibility, where I argued that it can change how we look upon the axiom of constructibility and whether this axiom should be considered ‘restrictive,’ as it often is in set theory. Ultimately, the Barwise extension theorem shows how wrong a model of set theory can be, if we should entertain the idea that the set-theoretic universe continues growing beyond it.

Regarding my new proof, below, however, what I find especially interesting about it, if not surprising in light of (i) above, is that it makes no use of Barwise compactness or completeness and indeed, no use of infinitary logic at all! Instead, the new proof uses only classical methods of descriptive set theory concerning the representation of $\Pi^1_1$ sets with well-founded trees, the Levy and Shoenfield absoluteness theorems, the reflection theorem and the Keisler-Morley theorem on elementary extensions via definable ultrapowers. Like the Barwise proof, my proof splits into cases depending on whether the model $M$ is standard or nonstandard, but another interesting thing about it is that with my proof, it is the $\omega$-nonstandard case that is easier, whereas with the Barwise proof, the transitive case was easiest, since one only needed to resort to the admissible cover when $M$ was ill-founded. Barwise splits into cases on well-founded/ill-founded, whereas in my argument, the cases are $\omega$-standard/$\omega$-nonstandard.

To clarify the terms, an end-extension of a model of set theory $\langle M,\in^M\rangle$ is another model $\langle N,\in^N\rangle$, such that the first is a substructure of the second, so that $M\subseteq N$ and $\in^M=\in^N\upharpoonright M$, but further, the new model does not add new elements to sets in $M$. In other words, $M$ is an $\in$-initial segment of $N$, or more precisely: if $a\in^N b\in M$, then $a\in M$ and hence $a\in^M b$.

Set theory, of course, overflows with instances of end-extensions. For example, the rank-initial segments $V_\alpha$ end-extend to their higher instances $V_\beta$, when $\alpha<\beta$; similarly, the hierarchy of the constructible universe $L_\alpha\subseteq L_\beta$ are end-extensions; indeed any transitive set end-extends to all its supersets. The set-theoretic universe $V$ is an end-extension of the constructible universe $L$ and every forcing extension $M[G]$ is an end-extension of its ground model $M$, even when nonstandard. (In particular, one should not confuse end-extensions with rank-extensions, also known as top-extensions, where one insists that all the new sets have higher rank than any ordinal in the smaller model.)

Let’s get into the proof.

Proof. Suppose that $M$ is a model of $\ZF$ set theory. Consider first the case that $M$ is $\omega$-nonstandard. For any particular standard natural number $k$, the reflection theorem ensures that there are arbitrarily high $L_\alpha^M$ satisfying $\ZFC_k+V=L$, where $\ZFC_k$ refers to the first $k$ axioms of $\ZFC$ in a fixed computable enumeration by length. In particular, every countable transitive set $m\in L^M$ has an end-extension to a model of $\ZFC_k+V=L$. By overspill (that is, since the standard cut is not definable), there must be some nonstandard $k$ for which $L^M$ thinks that every countable transitive set $m$ has an end-extension to a model of $\ZFC_k+V=L$, which we may assume is countable. This is a $\Pi^1_2$ statement about $k$, which will therefore also be true in $M$, by the Shoenfield absolutenss theorem. It will also be true in all the elementary extensions of $M$, as well as in their forcing extensions. And indeed, by the Keisler-Morley theorem, the model $M$ has an elementary top extension $M^+$. Let $\theta$ be a new ordinal on top of $M$, and let $m=V_\theta^{M^+}$ be the $\theta$-rank-initial segment of $M^+$, which is a top-extension of $M$. Let $M^+[G]$ be a forcing extension in which $m$ has become countable. Since the $\Pi^1_2$ statement is true in $M^+[G]$, there is an end-extension of $\langle m,\in^{M^+}\rangle$ to a model $\langle N,\in^N\rangle$ that $M^+[G]$ thinks satisfies $\ZFC_k+V=L$. Since $k$ is nonstandard, this theory includes all the $\ZFC$ axioms, and since $m$ end-extends $M$, we have found an end-extension of $M$ to a model of $\ZFC+V=L$, as desired.

It remains to consider the case where $M$ is $\omega$-standard. By the Keisler-Morley theorem, let $M^+$ be an elementary top-extension of $M$. Let $\theta$ be an ordinal of $M^+$ above $M$, and consider the corresponding rank-initial segment $m=V_\theta^{M^+}$, which is a transitive set in $M^+$ that covers $M$. If $\langle m,\in^{M^+}\rangle$ has an end-extension to a model of $\ZFC+V=L$, then we’re done, since such a model would also end-extend $M$. So assume toward contradiction that there is no such end-extension of $m$. Let $M^+[G]$ be a forcing extension in which $m$ has become countable. The assertion that $m$ has no end-extension to a model of $\ZFC+V=L$ is actually true and hence true in $M^+[G]$. This is a $\Pi^1_1$ assertion there about the real coding $m$. Every such assertion has a canonically associated tree, which is well-founded exactly when the statement is true. Since the statement is true in $M^+[G]$, this tree has some countable rank $\lambda$ there. Since these models have the standard $\omega$, the tree associated with the statement is the same for us as inside the model, and since the statement is actually true, the tree is actually well founded. So the rank $\lambda$ must come from the well-founded part of the model.

If $\lambda$ happens to be countable in $L^{M^+}$, then consider the assertion, “there is a countable transitive set, such that the assertion that it has no end-extension to a model of $\ZFC+V=L$ has rank $\lambda$.” This is a $\Sigma_1$ assertion, since it is witnessed by the countable transitive set and the ranking function of the tree associated with the non-extension assertion. Since the parameters are countable, it follows by Levy reflection that the statement is true in $L^{M^+}$. So $L^{M^+}$ has a countable transitive set, such that the assertion that it has no end-extension to a model of $\ZFC+V=L$ has rank $\lambda$. But since $\lambda$ is actually well-founded, the statement would have to be actually true; but it isn’t, since $L^{M^+}$ itself is such an extension, a contradiction.

So we may assume $\lambda$ is uncountable in $M^+$. In this case, since $\lambda$ was actually well-ordered, it follows that $L^M$ is well-founded beyond its $\omega_1$. Consider the statement “there is a countable transitive set having no end-extension to a model of $\ZFC+V=L$.” This is a $\Sigma^1_2$ sentence, which is true in $M^+[G]$ by our assumption about $m$, and so by Shoenfield absoluteness, it is true in $L^{M^+}$ and hence also $L^M$. So $L^M$ thinks there is a countable transitive set $b$ having no end-extension to a model of $\ZFC+V=L$. This is a $\Pi^1_1$ assertion about $b$, whose truth is witnessed in $L^M$ by a ranking of the associated tree. Since this rank would be countable in $L^M$ and this model is well-founded up to its $\omega_1$, the tree must be actually well-founded. But this is impossible, since it is not actually true that $b$ has no such end-extension, since $L^M$ itself is such an end-extension of $b$. Contradiction. $\Box$

One can prove a somewhat stronger version of the theorem, as follows.

Theorem. For any countable model $M$ of $\ZF$, with an inner model $W\models\ZFC$, and any statement $\phi$ true in $W$, there is an end-extension of $M$ to a model of $\ZFC+\phi$. Furthermore, one can arrange that every set of $M$ is countable in the extension model.

In particular, one can find end-extensions of $\ZFC+V=L+\phi$, for any statement $\phi$ true in $L^M$.

Proof. Carry out the same proof as above, except in all the statements, ask for end-extensions of $\ZFC+\phi$, instead of end-extensions of $\ZFC+V=L$, and also ask that the set in question become countable in that extension. The final contradictions are obtained by the fact that the countable transitive sets in $L^M$ do have end-extensions like that, in which they are countable, since $W$ is such an end-extension. $\Box$

For example, we can make the following further examples.

Corollaries.

1. Every countable model $M$ of $\ZFC$ with a measurable cardinal has an end-extension to a model $N$ of $\ZFC+V=L[\mu]$.
2. Every countable model $M$ of $\ZFC$ with extender-based large cardinals has an end-extension to a model $N$ satisfying $\ZFC+V=L[\vec E]$.
3. Every countable model $M$ of $\ZFC$ with infinitely many Woodin cardinals has an end-extension to a model $N$ of $\ZF+\text{AD}+V=L(\mathbb{R})$.

And in each case, we can furthermore arrange that every set of $M$ is countable in the extension model $N$.

This proof grew out of a project on the $\Sigma_1$-definable universal finite set, which I am currently undertaking with Kameryn Williams and Philip Welch.

Jon Barwise. Infinitary methods in the model theory of set theory. In Logic
Colloquium ’69 (Proc. Summer School and Colloq., Manchester, 1969), pages
53–66. North-Holland, Amsterdam, 1971.

# Infinite Sudoku and the Sudoku game

Consider what I call the Sudoku game, recently introduced in the MathOverflow question Who wins two-player Sudoku? posted by Christopher King (user PyRulez). Two players take turns placing numbers on a Sudoku board, obeying the rule that they must never explicitly violate the Sudoku condition: the numbers on any row, column or sub-board square must never repeat. The first player who cannot continue legal play loses. Who wins the game? What is the winning strategy?

The game is not about building a global Sudoku solution, since a move can be legal in this game even when it is not part of any global Sudoku solution, provided only that it doesn’t yet explicitly violate the Sudoku condition. Rather, the Sudoku game is about trying to trap your opponent in a maximal such position, a position which does not yet explicitly violate the Sudoku condition but which cannot be further extended.

In my answer to the question on MathOverflow, I followed an idea suggested to me by my daughter Hypatia, namely that on even-sized boards $n^2\times n^2$ where $n$ is even, then the second player can win with a mirroring strategy: simply copy the opponent’s moves in reflected mirror image through the center of the board. In this way, the second player ensures that the position on the board is always symmetric after her play, and so if the previous move was safe, then her move also will be safe by symmetry. This is therefore a winning strategy for the second player, since any violation of the Sudoku condition will arise on the opponent’s play.

This argument works on even-sized boards precisely because the reflection of every row, column and sub-board square is a totally different row, column and sub-board square, and so any new violation of the Sudoku conditions would reflect to a violation that was already there. The mirror strategy definitely does not work on the odd-sized boards, including the main $9\times 9$ case, since if the opponent plays on the central row, copying directly would immediately introduce a Sudoku violation.

After posting that answer, Orson Peters (user orlp) pointed out that one can modify it to form a winning strategy for the first player on odd-sized boards, including the main $9\times 9$ case. In this case, let the first player begin by playing $5$ in the center square, and then afterwards copy the opponent’s moves, but with the ten’s complement at the reflected location. So if the opponent plays $x$, then the first player plays $10-x$ at the reflected location. In this way, the first player can ensure that the board is ten’s complement symmetric after her moves. The point is that again this is sufficient to know that she will never introduce a violation, since if her $10-x$ appears twice in some row, column or sub-board square, then $x$ must have already appeared twice in the reflected row, column or sub-board square before that move.

This idea is fully general for odd-sized Sudoku boards $n^2\times n^2$, where $n$ is odd. If $n=2k-1$, then the first player starts with $k$ in the very center and afterward plays the $2k$-complement of her opponent’s move at the reflected location.

Conclusion.

1. On even-sized Sudoku boards, the second player wins the Sudoku game by the mirror copying strategy.
2. On odd-sized Sudoku boards, the first players wins the Sudoku game by the complement-mirror copying strategy.

Note that on the even boards, the second player could also play complement mirror copying just as successfully.

What I really want to tell you about, however, is the infinite Sudoku game (following a suggestion of Sam Hopkins). Suppose that we try to play the Sudoku game on a board whose subboard squares are $\mathbb{Z}\times\mathbb{Z}$, so that the full board is a $\mathbb{Z}\times\mathbb{Z}$ array of those squares, making $\mathbb{Z}^2\times\mathbb{Z}^2$ altogether. (Or perhaps you might prefer the board $\mathbb{N}^2\times\mathbb{N}^2$?)

One thing to notice is that on an infinite board, it is no longer possible to get trapped at a finite stage of play, since every finite position can be extended simply by playing a totally new label from the set of labels; such a move would never lead to a new violation of the explicit Sudoku condition.

For this reason, I should like to introduce the Sudoku Solver-Spoiler game variation as follows. There are two players: the Sudoku Solver and the Sudoku Spoiler. The Solver is trying to build a global Sudoku solution on the board, while the Spoiler is trying to prevent this. Both players must obey the Sudoku condition that labels are never to be explicitly repeated in any row, column or sub-board square. On an infinite board, the game proceeds transfinitely, until the board is filled or there are no legal moves. The Solver wins a play of the game, if she successfully builds a global Sudoku solution, which means not only that every location has a label and there are no repetitions in any row, column or sub-board square, but also that every label in fact appears in every row, column and sub-board square. That is, to count as a solution, the labels on any row, column and sub-board square must be a bijection with the set of labels. (On infinite boards, this is a stronger requirement than merely insisting on no repetitions.)

The Solver-Spoiler game makes sense in complete generality on any set $S$, whether finite or infinite. The sub-boards are $S^2=S\times S$, and one has an $S\times S$ array of them, so $S^2\times S^2$ for the whole board. Every row and column has the same size as the sub-board square $S^2$, and the set of labels should also have this size.

Upon reflection, one realizes that what matters about $S$ is just its cardinality, and we really have for every cardinal $\kappa$ the $\kappa$-Sudoku Solver-Spoiler game, whose board is $\kappa^2\times\kappa^2$, a $\kappa\times\kappa$ array of $\kappa\times\kappa$ sub-boards. In particular, the game $\mathbb{Z}^2\times\mathbb{Z}^2$ is actually isomorphic to the game $\mathbb{N}^2\times\mathbb{N}^2$, despite what might feel initially like a very different board geometry.

What I claim is that the Solver has a winning strategy in the all the infinite Sudoku Solver-Spoiler games, in a very general and robust manner.

Theorem. For every infinite cardinal $\kappa$, the Solver has a winning strategy to win the $\kappa$-Sudoku Solver-Spoiler game.

• The strategy will win in $\kappa$ many moves, producing a full Sudoku solution.
• The Solver can win whether she goes first or second, starting from any legal position of size less than $\kappa$.
• The Solver can win even when the Spoiler is allowed to play finitely many labels at once on each turn, or fewer than $\kappa$ many moves (if $\kappa$ is regular), even if the Solver is only allowed one move each turn.
• In the countably infinite Sudoku game, the Solver can win even if the Spoiler is allowed to make infinitely many moves at once, provided only that the resulting position can in principle be extended to a full solution.

Proof. Consider first the countably infinite Sudoku game, and assume the initial position is finite and that the Spoiler will make finitely many moves on each turn. Consider what it means for the Solver to win at the limit. It means, first of all, that there are no explicit repetitions in any row, column or sub-board. This requirement will be ensured since it is part of the rules for legal play not to violate it. Next, the Solver wants to ensure that every square has a label on it and that every label appears at least once in every row, every column and every sub-board. If we think of these as individual specific requirements, we have countably many requirements in all, and I claim that we can arrange that the Solver will simply satisfy the $n^{th}$ requirement on her $n^{th}$ play. Given any finite position, she can always find something to place in any given square, using a totally new label if need be. Given any finite position, any row and any particular label $k$, since can always find a place on that row to place the label, which has no conflict with any column or sub-board, since there are infinitely many to choose from and only finitely many conflicts. Similarly with columns and sub-boards. So each of the requirements can always be fulfilled one-at-a-time, and so in $\omega$ many moves she can produce a full solution.

The argument works equally well no matter who goes first or if the Spoiler makes arbitrary finite play, or indeed even infinite play, provided that the play is part of some global solution (perhaps a different one each time), since on each move the Solve can simply meet the requirement by using that solution at that stage.

An essentially similar argument works when $\kappa$ is uncountable, although now the play will proceed for $\kappa$ many steps. Assuming $\kappa^2=\kappa$, a consequence of the axiom of choice, there are $\kappa$ many requirements to meet, and the Solve can meet requirement $\alpha$ on the $\alpha^{th}$ move. If $\kappa$ is regular, we can again allow the Spoiler to make arbitrary size-less-than-$\kappa$ size moves, so that at any stage of play before $\kappa$ the position will still be size less than $\kappa$. (If $\kappa$ is singular, one can allow Spoiler to make finitely many moves at once or indeed even some uniform bounded size $\delta<\kappa$ many moves at once. $\Box$

I find it interesting to draw out the following aspect of the argument:

Observation. Every finite labeling of an infinite Sudoku board that does not yet explicitly violate the Sudoku condition can be extended to a global solution.

Similarly, any size less than $\kappa$ labeling that does not yet explicitly violate the Sudoku condition can be extended to a global solution of the $\kappa$-Sudoku board for any infinite cardinal $\kappa$.

What about asymmetric boards? It has come to my attention that people sometimes look at asymmetric Sudoku boards, whose sub-boards are not square, such as in the six-by-six Sudoku case. In general, one could take Sudoku boards to consist of a $\lambda\times\kappa$ array of sub-boards of size $\kappa\times\lambda$, where $\kappa$ and $\lambda$ are cardinals, not necessarily the same size and not necessarily both infinite or both finite. How does this affect the arguments I’ve given?

In the finite $(n\times m)\times (m\times n)$ case, if one of the numbers is even, then it seems to me that the reflection through the origin strategy works for the second player just as before. And if both are odd, then the first player can again play in the center square and use the mirror-complement strategy to trap the opponent. So that analysis will work fine.

In the case $(\kappa\times\lambda)\times(\lambda\times\kappa)$ where $\lambda\leq\kappa$ and $\kappa=\lambda\kappa$ is infinite, then the proof of the theorem seems to break, since if $\lambda<\kappa$, then with only $\lambda$ many moves, say putting a common symbol in each of the $\lambda$ many rectangles across a row, we can rule out that symbol in a fixed row. So this is a configuration of size less than $\kappa$ that cannot be extended to a full solution. For this reason, it seems likely to me that the Spoiler can win the Sudoko Solver-Spoiler game in the infinite asymmetric case.

Finally, let’s consider the Sudoku Solver-Spoiler game in the purely finite case, which actually is a very natural game, perhaps more natural than what I called the Sudoku game above. It seems to me that the Spoiler should be able to win the Solver-Spoiler game on any nontrivial finite board. But I don’t yet have an argument proving this. I asked a question on MathOverflow: The Sudoku game: Solver-Spoiler variation.

# Different set theories are never bi-interpretable

I was fascinated recently to discover something I hadn’t realized about relative interpretability in set theory, and I’d like to share it here. Namely,

Different set theories extending ZF are never bi-interpretable!

For example, ZF and ZFC are not bi-interpretable, and neither are ZFC and ZFC+CH, nor ZFC and ZFC+$\neg$CH, despite the fact that all these theories are equiconsistent. The basic fact is that there are no nontrivial instances of bi-interpretation amongst the models of ZF set theory. This is surprising, and could even be seen as shocking, in light of the philosophical remarks one sometimes hears asserted in the philosophy of set theory that what is going on with the various set-theoretic translations from large cardinals to determinacy to inner model theory, to mention a central example, is that we can interpret between these theories and consequently it doesn’t much matter which context is taken as fundamental, since we can translate from one context to another without loss.

The bi-interpretation result shows that these interpretations do not and cannot rise to the level of bi-interpretations of theories — the most robust form of mutual relative interpretability — and consequently, the translations inevitably must involve a loss of information.

To be sure, set theorists classify the various set-theoretic principles and theories into a hierarchy, often organized by consistency strength or by other notions of interpretative power, using forcing or definable inner models. From any model of ZF, for example, we can construct a model of ZFC, and from any model of ZFC, we can construct models of ZFC+CH or ZFC+$\neg$CH and so on. From models with sufficient large cardinals we can construct models with determinacy or inner-model-theoretic fine structure and vice versa. And while we have relative consistency results and equiconsistencies and even mutual interpretations, we will have no nontrivial bi-interpretations.

(I had proved the theorem a few weeks ago in joint work with Alfredo Roque Freire, who is visiting me in New York this year. We subsequently learned, however, that this was a rediscovery of results that have evidently been proved independently by various authors. Albert Visser proves the case of PA in his paper, “Categories of theories and interpretations,” Logic in Tehran, 284–341, Lect. Notes Log., 26, Assoc. Symbol. Logic, La Jolla, CA, 2006, (pdf, see pp. 52-55). Ali Enayat gave a nice model-theoretic argument for showing specifically that ZF and ZFC are not bi-interpretable, using the fact that ZFC models can have no involutions in their automorphism groups, but ZF models can; and he proved the general version of the theorem, for ZF, second-order arithmetic $Z_2$ and second-order set theory KM in his 2016 article, A. Enayat, “Variations on a Visserian theme,” in Liber Amicorum Alberti : a tribute to Albert Visser / Jan van Eijck, Rosalie Iemhoff and Joost J. Joosten (eds.) Pages, 99-110. ISBN, 978-1848902046. College Publications, London. The ZF version was apparently also observed independently by Harvey Friedman, Visser and Fedor Pakhomov.)

Meanwhile, let me explain our argument. Recall from model theory that one theory $S$ is interpreted in another theory $T$, if in any model of the latter theory $M\models T$, we can define (and uniformly so in any such model) a certain domain $N\subset M^k$ and relations and functions on that domain so as to make $N$ a model of $S$. For example, the theory of algebraically closed fields of characteristic zero is interpreted in the theory of real-closed fields, since in any real-closed field $R$, we can consider pairs $(a,b)$, thinking of them as $a+bi$, and define addition and multiplication on those pairs in such a way so as to construct an algebraically closed field of characteristic zero.

Two theories are thus mutually interpretable, if each of them is interpretable in the other. Such theories are necessarily equiconsistent, since from any model of one of them we can produce a model of the other.

Note that mutual interpretability, however, does not insist that the two translations are inverse to each other, even up to isomorphism. One can start with a model of the first theory $M\models T$ and define the interpreted model $N\models S$ of the second theory, which has a subsequent model of the first theory again $\bar M\models T$ inside it. But the definition does not insist on any particular connection between $M$ and $\bar M$, and these models need not be isomorphic nor even elementarily equivalent in general.

By addressing this, one arrives at a stronger and more robust form of mutual interpretability. Namely, two theories $S$ and $T$ are bi-interpretable, if they are mutually interpretable in such a way that the models can see that the interpretations are inverse. That is, for any model $M$ of the theory $T$, if one defines the interpreted model $N\models S$ inside it, and then defines the interpreted model $\bar M$ of $T$ inside $N$, then $M$ is isomorphic to $\bar M$ by a definable isomorphism in $M$, and uniformly so (and the same with the theories in the other direction). Thus, every model of one of the theories can see exactly how it itself arises definably in the interpreted model of the other theory.

For example, the theory of linear orders $\leq$ is bi-interpretable with the theory of strict linear order $<$, since from any linear order $\leq$ we can define the corresponding strict linear order $<$ on the same domain, and from any strict linear order $<$ we can define the corresponding linear order $\leq$, and doing it twice brings us back again to the same order.

For a richer example, the theory PA is bi-interpretable with the finite set theory $\text{ZF}^{\neg\infty}$, where one drops the infinity axiom from ZF and replaces it with the negation of infinity, and where one has the $\in$-induction scheme in place of the foundation axiom. The interpretation is via the Ackerman encoding of hereditary finite sets in arithmetic, so that $n\mathrel{E} m$ just in case the $n^{th}$ binary digit of $m$ is $1$. If one starts with the standard model $\mathbb{N}$, then the resulting structure $\langle\mathbb{N},E\rangle$ is isomorphic to the set $\langle\text{HF},\in\rangle$ of hereditarily finite sets. More generally, by carrying out the Ackermann encoding in any model of PA, one thereby defines a model of $\text{ZF}^{\neg\infty}$, whose natural numbers are isomorphic to the original model of PA, and these translations make a bi-interpretation.

We are now ready to prove that this bi-interpretation situation does not occur with different set theories extending ZF.

Theorem. Distinct set theories extending ZF are never bi-interpretable. Indeed, there is not a single model-theoretic instance of bi-interpretation occurring with models of different set theories extending ZF.

Proof. I mean “distinct” here in the sense that the two theories are not logically equivalent; they do not have all the same theorems. Suppose that we have a bi-interpretation instance of the theories $S$ and $T$ extending ZF. That is, suppose we have a model $\langle M,\in\rangle\models T$ of the one theory, and inside $M$, we can define an interpreted model of the other theory $\langle N,\in^N\rangle\models S$, so the domain of $N$ is a definable class in $M$ and the membership relation $\in^N$ is a definable relation on that class in $M$; and furthermore, inside $\langle N,\in^N\rangle$, we have a definable structure $\langle\bar M,\in^{\bar M}\rangle$ which is a model of $T$ again and isomorphic to $\langle M,\in^M\rangle$ by an isomorphism that is definable in $\langle M,\in^M\rangle$. So $M$ can define the map $a\mapsto \bar a$ that forms an isomorphism of $\langle M,\in^M\rangle$ with $\langle \bar M,\in^{\bar M}\rangle$. Our argument will work whether we allow parameters in any of these definitions or not.

I claim that $N$ must think the ordinals of $\bar M$ are well-founded, for otherwise it would have some bounded cut $A$ in the ordinals of $\bar M$ with no least upper bound, and this set $A$ when pulled back pointwise by the isomorphism of $M$ with $\bar M$ would mean that $M$ has a cut in its own ordinals with no least upper bound; but this cannot happen in ZF.

If the ordinals of $N$ and $\bar M$ are isomorphic in $N$, then all three models have isomorphic ordinals in $M$, and in this case, $\langle M,\in^M\rangle$ thinks that $\langle N,\in^N\rangle$ is a well-founded extensional relation of rank $\text{Ord}$. Such a relation must be set-like (since there can be no least instance where the predecessors form a proper class), and so $M$ can perform the Mostowski collapse of $\in^N$, thereby realizing $N$ as a transitive class $N\subseteq M$ with $\in^N=\in^M\upharpoonright N$. Similarly, by collapsing we may assume $\bar M\subseteq N$ and $\in^{\bar M}=\in^M\upharpoonright\bar M$. So the situation consists of inner models $\bar M\subseteq N\subseteq M$ and $\langle \bar M,\in^M\rangle$ is isomorphic to $\langle M,\in^M\rangle$ in $M$. This is impossible unless all three models are identical, since a simple $\in^M$-induction shows that $\pi(y)=y$ for all $y$, because if this is true for the elements of $y$, then $\pi(y)=\{\pi(x)\mid x\in y\}=\{x\mid x\in y\}=y$. So $\bar M=N=M$ and so $N$ and $M$ satisfy the same theory, contrary to assumption.

If the ordinals of $\bar M$ are isomorphic to a proper initial segment of the ordinals of $N$, then a similar Mostowski collapse argument would show that $\langle\bar M,\in^{\bar M}\rangle$ is isomorphic in $N$ to a transitive set in $N$. Since this structure in $N$ would have a truth predicate in $N$, we would be able to pull this back via the isomorphism to define (from parameters) a truth predicate for $M$ in $M$, contrary to Tarski’s theorem on the non-definability of truth.

The remaining case occurs when the ordinals of $N$ are isomorphic in $N$ to an initial segment of the ordinals of $\bar M$. But this would mean that from the perspective of $M$, the model $\langle N,\in^N\rangle$ has some ordinal rank height, which would mean by the Mostowski collapse argument that $M$ thinks $\langle N,\in^N\rangle$ is isomorphic to a transitive set. But this contradicts the fact that $M$ has an injection of $M$ into $N$. $\Box$

It follows that although ZF and ZFC are equiconsistent, they are not bi-interpretable. Similarly, ZFC and ZFC+CH and ZFC+$\neg$CH are equiconsistent, but no pair of them is bi-interpretable. And again with all the various equiconsistency results concerning large cardinals.

A similar argument works with PA to show that different extensions of PA are never bi-interpretable.

# Nonstandard models of arithmetic arise in the complex numbers

I’d like to explain that one may find numerous nonstandard models of arithmetic as substructures of the field of complex numbers.

The issue arose yesterday at Hans Schoutens’s talk for the CUNY Logic Workshop. The main focus of the talk was the question, for a given algebraically closed field $k$ of characteristic zero and a given model of arithmetic $\Gamma\models$PA, whether $\Gamma$ and $k$ were jointly realizable as the set of powers (as he defines it) and the set of units of a model $S$ of the generalized theory of polynomial rings over fields. Very interesting stuff.

During the talk, a side question arose, concerning exactly which models of PA arise as substructures of the field of complex numbers.

Question. Which models of PA arise as substructures of the field of complex numbers $\langle\mathbb{C},+,\cdot\rangle$?

Of course the standard model $\mathbb{N}$ arises this way, and some people thought at first it should be difficult to realize nonstandard models of PA as substructures of $\mathbb{C}$. After some back and forth, the question was ultimately answered by Alfred Dolich in the pub after the seminar, and I’d like to give his argument here (but see the Mlček reference below).  This is a case where a problem that was initially confusing becomes completely clear!

Theorem. Every model of PA of size at most continuum arises as a sub-semiring of the field of complex numbers $\langle\mathbb{C},+,\cdot\rangle$.

Proof. Suppose that $M$ is a model of PA of size at most continuum. Inside $M$, we may form $M$’s version of the algebraic numbers $A=\bar{\mathbb{Q}}^M$, the field that $M$ thinks is the algebraic closure of its version of the rationals. So $A$ is an algebraically closed field of characteristic zero, which has an elementary extension to such a field of size continuum. Since the theory of algebraically closed fields of characteristic zero is categorical in all uncountable powers, it follows that $A$ is isomorphic to a submodel of $\mathbb{C}$. Since $M$ itself is isomorphic to a substructure of its rationals $\mathbb{Q}^M$, which sit inside $A$, it follows that $M$ is isomorphic to a substructure of $\mathbb{C}$, as claimed. QED

In particular, every countable model of PA can be found as a substructure of the complex numbers.

Essentially the same argument shows the following.

Theorem. If $k$ is an uncountable algebraically closed field of characteristic zero, then every model of arithmetic $M\models$PA of size at most the cardinality of $k$ embeds into $k$.

I’ve realized that the same collection of ideas shows the following striking way to look upon the complex numbers:

Theorem. The complex numbers $\mathbb{C}$ can be viewed as a nonstandard version of the algebraic numbers $\bar{\mathbb{Q}}^M$ inside a nonstandard model $M$ of PA. Indeed, for every uncountable algebraically closed field $F$ of characteristic zero and every model of arithmetic $M\models$PA of the same cardinality, the field $F$ is isomorphic to the nonstandard algebraic numbers $\bar{\mathbb{Q}}^M$ as $M$ sees them.

Proof. Fix any such field $F$, such as the complex numbers themselves, and consider any nonstandard model of arithmetic $M$ of the same cardinality. The field $\bar{\mathbb{Q}}^M$, which is $M$’s nonstandard version of the algebraic numbers, is an algebraically closed field of characteristic zero and same uncountable size as $F$. By categoricity, these fields are isomorphic. $\Box$

I had suspected that these results were folklore in the model-theoretic community, and it has come to my attention that proper credit for the main observation of this post seems to be due to Jozef Mlček, who proved it in 1973. Thanks to Jerome Tauber for the reference, which he provided in the comments.

# Discussion of McCallum’s paper on Reinhardt cardinals in ZF

Update: Rupert has withdrawn his claim. See the final bullet point below.

Rupert McCallum has posted a new paper to the mathematics arXiv

Rupert McCallum, The choiceless cardinals are inconsistent, mathematics arXiv 2017: 1712.09678.

He is claiming to establish the Kunen inconsistency in ZF, without the axiom of choice, which is a long-standing open question. In particular, this would refute the Reinhardt cardinals in ZF and all the stronger ZF large cardinals that have been studied.

If correct, this result will constitute a central advance in large cardinal set theory.

I am making this post to provide a place to discuss the proof and any questions that people might have about it. Please feel free to post comments with questions or answers to other questions that have been posted. I will plan to periodically summarize things in the main body of this post as the discussion proceeds.

• My first question concerns lemma 0.4, where he claims that $j’\upharpoonright V_{\lambda+2}^N$ is a definable class in $N$. He needs this to get the embedding into $N$, but I don’t see why the embedding should be definable here.
• I wrote to Rupert about this concern, and he replied that it may be an issue, and that he intends to post a new version of his paper, where he may retreat to the weaker claim refuting only the super-Reinhardt cardinals.
• The updated draft is now available. Follow the link above. It will become also available on the arXiv later this week.
• The second January 2 draft has a new section claiming again the original refutation of Reinhardt cardinals.
• New draft January 3. Rupert has reportedly been in communication with Matteo Viale about his result.
• Rupert has announced (Jan 3) that he is going to take a week or so to produce a careful rewrite.
• He has made available his new draft, January 7. It will also be posted on the arXiv.
• January 8:  In light of the issues identified on this blog, especially the issue mentioned by Gabe, Rupert has sent me an email stating (and asking me to post here) that he is planning to think it through over the next couple of weeks and will then make some kind of statement about whether he thinks he can save the argument.  For the moment, therefore, it seems that we should consider the proof to be on hold.
• January 24: After consideration, Rupert has withdrawn the claim, sending me the following message:

“Gabriel has very kindly given me extensive feedback on many different drafts. I attach the latest version which he commented on [January 24 draft above]. He has identified the flaw, namely that on page 3 I claim that $\exists n \forall Y \in W_n \psi(Y)$ if and only if $\forall Y \in U \psi(Y)$. This claim is not justified, and this means that there is no way that is apparent to me to rescue the proof of Lemma 1.2. Gabriel has directed me to a paper of Laver which does indeed show that my mapping e is an elementary embedding but which does not give the stronger claim that I want.

…So, I withdraw my claim. It is possible that this method of proof can work somehow, but some new insight is needed to make it work.”

-Rupert McCallum, January 24, 2018

# Escape!

Here is an interesting game I heard a few days ago from one of my undergraduate students; I’m not sure of the provenance.

The game is played with stones on a grid, which extends indefinitely upward and to the right, like the lattice $\mathbb{N}\times\mathbb{N}$.  The game begins with three stones in the squares nearest the origin at the lower left.  The goal of the game is to vacate all stones from those three squares. At any stage of the game, you may remove a stone and replace it with two stones, one in the square above and one in the square to the right, provided that both of those squares are currently unoccupied.

For example, here is a sample play.

Question. Can you play so as completely to vacate the yellow corner region?

One needs only to move the other stones out of the way so that the corner stones have room to move out. Can you do it? It isn’t so easy, but I encourage you to try.

Here is an online version of the game that I coded up quickly in Scratch: Escape!

My student mentioned the problem to me and some other students in my office on the day of the final exam, and we puzzled over it, but then it was time for the final exam. So I had a chance to think about it while giving the exam and came upon a solution. I’ll post my answer later on, but I’d like to give everyone a chance to think about it first.

Solution. Here is the solution I hit upon, and it seems that many others also found this solution. The main idea is to assign an invariant to the game positions. Let us assign weights to the squares in the lattice according to the following pattern. We give the corner square weight $1/2$, the next diagonal of squares $1/4$ each, and then $1/8$, and so on throughout the whole playing board. Every square should get a corresponding weight according to the indicated pattern.

The weights are specifically arranged so that making a move in the game preserves the total weight of the occupied squares. That is, the total weight of the occupied squares is invariant as play proceeds, because moving a stone with weight $1/2^k$ will create two stones of weight $1/2^{k+1}$, which adds up to the same. Since the original three stones have total weight $\frac 12+\frac14+\frac14=1$, it follows that the total weight remains $1$ after every move in the game.

Meanwhile, let us consider the total weight of all the squares on the board. If you consider the bottom row only, the weights add to $\frac12+\frac14+\frac18+\cdots$, which is the geometric series with sum $1$. The next row has total weight $\frac14+\frac18+\frac1{16}+\cdots$, which adds to $1/2$. And the next adds to $1/4$ and so on. So the total weight of all the squares on the board is $1+\frac12+\frac14+\cdots$, which is $2$.  Since we have $k$ stones with weight $1/2^k$, another way to think about it is that we are essentially establishing the sum $\sum_k\frac k{2^k}=2$.

The subtle conclusion is that after any finite number of moves, only finitely many of those other squares are occupied, and so some of them remain empty. So after only finitely many moves, the total weight of the occupied squares off of the original L-shape is strictly less than $1$. Since the total weight of all the occupied squares is exactly $1$, this means that the L-shape has not been vacated.

So it is impossible to vacate the original L-shape in finitely many moves. $\Box$

Suppose that we relax the one-stone-per-square requirement, and allow you to stack several stones on a single square, provided that you eventually unstack them. In other words, can you play the stacked version of the game, so as to vacate the original three squares, provided that all the piled-up stones eventually are unstacked?

No, it is impossible! And the proof is the same invariant-weight argument as above. The invariance argument does not rely on the one-stone-per-square rule during play, since it is still an invariant if one multiplies the weight of a square by the number of stones resting upon it. So we cannot transform the original stones, with total weight $1$, to any finite number of stones on the rest of the board (with one stone per square in the final position), since those other squares do not have sufficient weight to add up to $1$, even if we allow them to be stacked during intermediate stages of play.

Meanwhile, let us consider playing the game on a finite $n\times n$ board, with the rule modified so that stones that would be created in row or column $n+1$ in the infinite game simply do not materialize in the $n\times n$ game. This breaks the proof, since the weight is no longer an invariant for moves on the outer edges. Furthermore, one can win this version of the game. It is easy to see that one can systematically vacate all stones on the upper and outer edges, simply by moving any of them that is available, pushing the remaining stones closer to the outer corner and into oblivion. Similarly, one can vacate the penultimate outer edges, by doing the same thing, which will push stones into the outer edges, which can then be vacated. By reverse induction from the outer edges in, one can vacate every single row and column. Thus, for play on this finite board with the modified rule on the outer edges, one can vacate the entire $n\times n$ board!

Indeed, in the finite $n\times n$ version of the game, there is no way to lose! If one simply continues making legal moves as long as this is possible, then the board will eventually be completely vacated. To see this, notice first that if there are stones on the board, then there is at least one legal move. Suppose that we can make an infinite sequence of legal moves on the $n\times n$ board. Since there are only finitely many squares, some of the squares must have been moved-upon infinitely often. If you consider such a square closest to the origin (or of minimal weight in the scheme of weights above), then since the lower squares are activated only finitely often, it is clear that eventually the given square will replenished for the last time. So it cannot have been activated infinitely often. (Alternatively, argue by induction on squares from the lower left that they are moved-upon at most finitely often.) Indeed, I claim that the number of steps to win, vacating the $n\times n$ board, does not depend on the order of play. One can see this by thinking about the path of a given stone and its clones through the board, ignoring the requirement that a given square carries only one stone. That is, let us make all the moves in parallel time. Since there is no interaction between the stones that would otherwise interfere, it is clear that the number of stones appearing on a given square in total is independent of the order of play. A tenacious person could calculate the sum exactly: each square is becomes occupied by a number of stones that is equal to the number of grid paths to it from one of the original three stones, and one could use this sum to calculate the total length of play on the $n\times n$ board.

# Still don’t know, an epistemic logic puzzle

Here is a epistemic logic puzzle that I wrote for my students in the undergraduate logic course I have been teaching this semester at the College of Staten Island at CUNY.  We had covered some similar puzzles in lecture, including Cheryl’s Birthday and the blue-eyed islanders.

Bonus Question. Suppose that Alice and Bob are each given a different fraction, of the form $\frac{1}{n}$, where $n$ is a positive integer, and it is commonly known to them that they each know only their own number and that it is different from the other one. The following conversation ensues.

JDH: I privately gave you each a different rational number of the form $\frac{1}{n}$. Who has the larger number?

Alice: I don’t know.

Bob: I don’t know either.

Alice: I still don’t know.

Bob: Suddenly, now I know who has the larger number.

Alice: In that case, I know both numbers.

What numbers were they given?

Give the problem a try! See the solution posted below.

Meanwhile, for a transfinite epistemic logic challenge — considerably more difficult — see my puzzle Cheryl’s rational gifts.

Solution.
When Alice says she doesn’t know, in her first remark, the meaning is exactly that she doesn’t have $\frac 11$, since that is only way she could have known who had the larger number.  When Bob replies after this that he doesn’t know, then it must be that he also doesn’t have $\frac 11$, but also that he doesn’t have $\frac 12$, since in either of these cases he would know that he had the largest number, but with any other number, he couldn’t be sure. Alice replies to this that she still doesn’t know, and the content of this remark is that Alice has neither $\frac 12$ nor $\frac 13$, since in either of these cases, and only in these cases, she would have known who has the larger number. Bob replies that suddenly, he now knows who has the larger number. The only way this could happen is if he had either $\frac 13$ or $\frac 14$, since in either of these cases he would have the larger number, but otherwise he wouldn’t know whether Alice had $\frac 14$ or not. But we can’t be sure yet whether Bob has $\frac 13$ or $\frac 14$. When Alice says that now she knows both numbers, however, then it must be because the information that she has allows her to deduce between the two possibilities for Bob. If she had $\frac 15$ or smaller, she wouldn’t be able to distinguish the two possibilities for Bob. Since we already ruled out $\frac 13$ for her, she must have $\frac 14$. So Alice has $\frac 14$ and Bob has $\frac 13$.

Many of the commentators came to the same conclusion. Congratulations to all who solved the problem! See also the answers posted on my math.stackexchange question and on Twitter: