Counting to Infinity and Beyond

Anyone can learn to count in the ordinals, even a child, and so let us learn how to count to $\omega^2$, the first compound limit ordinal.

The large-format poster is available:

Some close-up views:

I would like to thank the many people who had made helpful suggestions concerning my poster, including Andrej Bauer and especially Saul Schleimer, who offered many detailed suggestions.

An algebra of orders

Did you know that you can add and multiply orders? For any two order structures $A$ and $B$, we can form the ordered sum $A+B$ and ordered product $A\otimes B$, and other natural operations, such as the disjoint sum $A\sqcup B$, which make altogether an arithmetic of orders. We combine orders with these operations to make new orders, often with interesting properties. Let us explore the resulting algebra of orders!

$\newcommand\Z{\mathbb{Z}}\newcommand\N{\mathbb{N}}\newcommand\Q{\mathbb{Q}}\newcommand\P{\mathbb{P}}\newcommand\iso{\cong}\newcommand\of{\subseteq}$
One of the most basic operations that we can use to combine two orders is the disjoint sum operation $A\sqcup B$. This is the order resulting from placing a copy of $A$ adjacent to a copy of $B$, side-by-side, forming a combined order with no instances of the order relation between the two parts. If $A$ is the orange $\vee$-shaped order here and $B$ is the yellow linear order, for example, then $A\sqcup B$ is the combined order with all five nodes.

Another kind of addition is the ordered sum of two orders $A+B$, which is obtained by placing a copy of $B$ above a copy of $A$, as indicated here by adding the orange copy of $A$ and the yellow copy of $B$. Also shown is the sum $B+A$, with the summands reversed, so that we take $B$ below and $A$ on top. It is easy to check that the ordered sum of two orders is an order. One notices immediately, of course, that the resulting ordered sums $A+B$ and $B+A$ are not the same! The order $A+B$ has a greatest element, whereas $B+A$ has two maximal elements. So the ordered sum operation on orders is not commutative. Nevertheless, we shall still call it addition. The operation, which has many useful and interesting features, goes back at least to the 19th century with Cantor, who defined the addition of well orders this way.

In order to illustrate further examples, I have assembled here an addition table for several simple finite orders. The choices for $A$ appear down the left side and those for $B$ at the top, with the corresponding sum $A+B$ displayed in each cell accordingly.

We can combine the two order addition operations, forming a variety of other orders this way.

The reader is encouraged to explore further how to add various finite orders using these two forms of addition. What is the smallest order that you cannot generate from $1$ using $+$ and $\sqcup$? Please answer in the comments.

We can also add infinite orders. Displayed here, for example, is the order $\N+(1\sqcup 1)$, the natural numbers wearing two yellow caps. The two yellow nodes at the top form a copy of $1\sqcup 1$, while the natural numbers are the orange nodes below. Every natural number (yes, all infinitely many of them) is below each of the two nodes at the top, which are incomparable to each other. Notice that even though we have Hasse diagrams for each summand order here, there can be no minimal Hasse diagram for the sum, because any particular line from a natural number to the top would be implied via transitivity from higher such lines, and we would need such lines, since they are not implied by the lower lines. So there is no minimal Hasse diagram.

This order happens to illustrate what is called an exact pair, which occurs in an order when a pair of incomparable nodes bounds a chain below, with the property that any node below both members of the pair is below something in the chain. This phenomenon occurs in sometimes unexpected contexts—any countable chain in the hierarchy of Turing degrees in computability theory, for example, admits an exact pair.

Let us turn now to multiplication. The ordered product $A\otimes B$ is the order resulting from having $B$ many copies of $A$. That is, we replace each node of $B$ with an entire copy of the $A$ order. Within each of these copies of $A$, the order relation is just as in $A$, but the order relation between nodes in different copies of $A$, we follow the $B$ relation. It is not difficult to check that indeed this is an order relation. We can illustrate here with the same two orders we had earlier.

In forming the ordered product $A\otimes B$, in the center here, we take the two yellow nodes of $B$, shown greatly enlarged in the background, and replace them with copies of $A$. So we have ultimately two copies of $A$, one atop the other, just as $B$ has two nodes, one atop the other. We have added the order relations between the lower copy of $A$ and the upper copy, because in $B$ the lower node is related to the upper node. The order $A\otimes B$ consists only of the six orange nodes—the large highlighted yellow nodes of $B$ here serve merely as a helpful indication of how the product is formed and are not in any way part of the product order $A\otimes B$.

Similarly, with $B\otimes A$, at the right, we have the three enlarged orange nodes of $B$ in the background, which have each been replaced with copies of $A$. The nodes of each of the lower copies of $A$ are related to the nodes in the top copy, because in $B$ the two lower nodes are related to the upper node.

I have assembled a small multiplication table here for some simple finite orders.

So far we have given an informal account of how to add and multiply ordered ordered structures. So let us briefly be a little more precise and formal with these matters.

In fact, when it comes to addition, there is a slightly irritating matter in defining what the sums $A\sqcup B$ and $A+B$ are exactly. Specifically, what are the domains? We would like to conceive of the domains of $A\sqcup B$ and $A+B$ simply as the union the domains of $A$ and $B$—we’d like just to throw the two domains together and form the sums order using that combined domain, placing $A$ on the $A$ part and $B$ on the $B$ part (and adding relations from the $A$ to the $B$ part for $A+B$). Indeed, this works fine when the domains of $A$ and $B$ are disjoint, that is, if they have no points in common. But what if the domains of $A$ and $B$ overlap? In this case, we can’t seem to use the union in this straightforward manner. In general, we must disjointify the domains—we take copies of $A$ and $B$, if necessary, on domains that are disjoint, so that we can form the sums $A\sqcup B$ and $A+B$ on the union of those nonoverlapping domains.

What do we mean precisely by “taking a copy” of an ordered structure $A$? This way of talking in mathematics partakes in the philosophy of structuralism. We only care about our mathematical structures up to isomorphism, after all, and so it doesn’t matter which isomorphic copies of $A$ and $B$ we use; the resulting order structures $A\sqcup B$ will be isomorphic, and similarly for $A+B$. In this sense, we are defining the sum orders only up to isomorphism.

Nevertheless, we can be definite about it, if only to verify that indeed there are copies of $A$ and $B$ available with disjoint domains. So let us construct a set-theoretically specific copy of $A$, replacing each individual $a$ in the domain of $A$ with $(a,\text{orange})$, for example, and replacing the elements $b$ in the domain of $B$ with $(b,\text{yellow})$. If “orange” is a specific object distinct from “yellow,” then these new domains will have no points in common, and we can form the disjoint sum $A\sqcup B$ by using the union of these new domains, placing the $A$ order on the orange objects and the $B$ order on the yellow objects.

Although one can use this specific disjointifying construction to define what $A\sqcup B$ and $A+B$ mean as specific structures, I would find it to be a misunderstanding of the construction to take it as a suggestion that set theory is anti-structuralist. Set theorists are generally as structuralist as they come in mathematics, and in light of Dedekind’s categorical account of the natural numbers, one might even find the origin of the philosophy of structuralism in set theory. Rather, the disjointifying construction is part of the general proof that set theory abounds with isomorphic copies of whatever mathematical structure we might have, and this is part of the reason why it serves well as a foundation of mathematics for the structuralist. To be a structuralist means not to care which particular copy one has, to treat one’s mathematical structures as invariant under isomorphism.

But let me mention a certain regrettable consequence of defining the operations by means of a specific such disjointifying construction in the algebra of orders. Namely, it will turn out that neither the disjoint sum operation nor the ordered sum operation, as operations on order structures, are associative. For example, if we use $1$ to represent the one-point order, then $1\sqcup 1$ means the two-point side-by-side order, one orange and one yellow, but really what we mean is that the points of the domain are $\set{(a,\text{orange}),(a,\text{yellow})}$, where the original order is on domain $\set{a}$. The order $(1\sqcup 1)\sqcup 1$ then means that we take an orange copy of that order plus a single yellow point. This will have domain
$$\set{\bigl((a,\text{orange}),\text{orange}\bigr),\bigl((a,\text{yellow}),\text{orange}\bigr),(a,\text{yellow})}$$
The order $1\sqcup(1\sqcup 1)$, in contrast, means that we take a single orange point plus a yellow copy of $1\sqcup 1$, leading to the domain
$$\set{(a,\text{orange}),\bigl((a,\text{orange}),\text{yellow}\bigr),\bigl((a,\text{yellow}),\text{yellow}\bigr)}$$
These domains are not the same! So as order structures, the order $(1\sqcup 1)\sqcup 1$ is not identical with $1\sqcup(1\sqcup 1)$, and therefore the disjoint sum operation is not associative. A similar problem arises with $1+(1+1)$ and $(1+1)+1$.

But not to worry—we are structuralists and care about our orders here only up to isomorphism. Indeed, the two resulting orders are isomorphic as orders, and more generally, $(A\sqcup B)\sqcup C$ is isomorphic to $A\sqcup(B\sqcup C)$ for any orders $A$, $B$, and $C$, and similarly with $A+(B+C)\cong(A+B)+C$, as discussed with the theorem below. Furthermore, the order isomorphism relation is a congruence with respect to the arithmetic we have defined, which means that $A\sqcup B$ is isomorphic to $A’\sqcup B’$ whenever $A$ and $B$ are respectively isomorphic to $A’$ and $B’$, and similarly with $A+B$ and $A\otimes B$. Consequently, we can view these operations as associative, if we simply view them not as operations on the order structures themselves, but on their order-types, that is, on their isomorphism classes. This simple abstract switch in perspective restores the desired associativity. In light of this, we are free to omit the parentheses and write $A\sqcup B\sqcup C$ and $A+B+C$, if care about our orders only up to isomorphism. Let us therefore adopt this structuralist perspective for the rest of our treatment of the algebra of orders.

Let us give a more precise formal definition of $A\otimes B$, which requires no disjointification. Specifically, the domain is the set of pairs $\set{(a,b)\mid a\in A, b\in B}$, and the order is defined by $(a,b)\leq_{A\otimes B}(a’,b’)$ if and only if $b\leq_B b’$, or $b=b’$ and $a\leq_A a’$. This order is known as the reverse lexical order, since we are ordering the nodes in the dictionary manner, except starting from the right letter first rather than the left as in an ordinary dictionary. One could of course have defined the product using the lexical order instead of the reverse lexical order, and this would give $A\otimes B$ the meaning of “$A$ copies of $B$.” This would be a fine alternative and in my experience mathematicians who rediscover the ordered product on their own often tend to use the lexical order, which is natural in some respects. Nevertheless, there is a huge literature with more than a century of established usage with the reverse lexical order, from the time of Cantor, who defined ordinal multiplication $\alpha\beta$ as $\beta$ copies of $\alpha$. For this reason, it seems best to stick with the reverse lexical order and the accompanying idea that $A\otimes B$ means “$B$ copies of $A$.” Note also that with the reverse lexical order, we shall be able to prove left distributivity $A\otimes(B+C)=A\otimes B+A\otimes C$, whereas with the lexical order, one will instead have right distributivity $(B+C)\otimes^* A=B\otimes^* A+C\otimes^* A$.

Let us begin to prove some basic facts about the algebra of orders.

Theorem. The following identities hold for orders $A$, $B$, and $C$.

  1. Associativity of disjoint sum, ordered sum, and ordered product.\begin{eqnarray*}A\sqcup(B\sqcup C) &\iso& (A\sqcup B)\sqcup C\\ A+(B+C) &\iso& (A+B)+C\\ A\otimes(B\otimes C) &\iso& (A\otimes B)\otimes C \end{eqnarray*}
  2. Left distributivity of product over disjoint sum and ordered sum.\begin{eqnarray*} A\otimes(B\sqcup C) &\iso& (A\otimes B)\sqcup(A\otimes C)\\ A\otimes(B+C) &\iso& (A\otimes B)+(A\otimes C) \end{eqnarray*}

In each case, these identities are clear from the informal intended meaning of the orders. For example, $A+(B+C)$ is the order resulting from having a copy of $A$, and above it a copy of $B+C$, which is a copy of $B$ and a copy of $C$ above it. So one has altogether a copy of $A$, with a copy of $B$ above that and a copy of $C$ on top. And this is the same as $(A+B)+C$, so they are isomorphic.

One can also aspire to give a detailed formal proof verifying that our color-coded disjointifying process works as desired, and the reader is encouraged to do so as an exercise. To my way of thinking, however, such a proof offers little in the way of mathematical insight into algebra of orders. Rather, it is about checking the fine print of our disjointifying process and making sure that things work as we expect. Several of the arguments can be described as parenthesis-rearranging arguments—one extracts the desired information from the structure of the domain order and puts that exact same information into the correct form for the target order.

For example, if we have used the color-scheme disjointifying process described above, then the elements of $A\sqcup(B\sqcup C)$ each have one of the following forms, where $a\in A$, $b\in B$, and $c\in C$:
$$(a,\text{orange})$$
$$\bigl((b,\text{orange}),\text{yellow}\bigr)$$
$$\bigl((c,\text{yellow}),\text{yellow}\bigr)$$
We can define the color-and-parenthesis-rearranging function $\pi$ to put them into the right form for $(A\sqcup B)\sqcup C$ as follows:
\begin{align*}
\pi:(a,\text{orange})\quad&\mapsto\quad \bigl((a,\text{orange}),\text{orange}\bigl) \\
\pi:\bigl((b,\text{orange}),\text{yellow}\bigr)\quad&\mapsto\quad \bigl((b,\text{yellow}),\text{orange}\bigl) \\
\pi:\bigl((c,\text{yellow}),\text{yellow}\bigr)\quad&\mapsto\quad (c,\text{yellow})
\end{align*}
In each case, we will preserve the order, and since the orders are side-by-side, the cases never interact in the order, and so this is an isomorphism.

Similarly, for distributivity, the elements of $A\otimes(B\sqcup C)$ have the two forms:
$$\bigl(a,(b,\text{orange})\bigr)$$
$$\bigl(a,(c,\text{yellow})\bigr)$$
where $a\in A$, $b\in B$, and $c\in C$. Again we can define the desired ismorphism $\tau$ by putting these into the right form for $(A\otimes B)\sqcup(A\otimes C)$ as follows:
\begin{align*}
\tau:\bigl(a,(b,\text{orange})\bigr)\quad&\mapsto\quad \bigl((a,b),\text{orange}\bigr) \\
\tau:\bigl(a,(c,\text{yellow})\bigr)\quad&\mapsto\quad\bigl((a,c),\text{yellow}\bigr)
\end{align*}
And again, this is an isomorphism, as desired.

Since order multiplication is not commutative, it is natural to inquire about the right-sided distributivity laws:
\begin{eqnarray*}
(B+C)\otimes A&\overset{?}{\cong}&(B\otimes A)+(C\otimes A)\\
(B\sqcup C)\otimes A&\overset{?}{\cong}&(B\otimes A)\sqcup(C\otimes A)
\end{eqnarray*}
Unfortunately, however, these do not hold in general, and the following instances are counterexamples. Can you see what to take as $A$, $B$, and $C$? Please answer in the comments.

Theorem.

  1. If $A$ and $B$ are linear orders, then so are $A+B$ and $A\otimes B$.
  2. If $A$ and $B$ are nontrivial linear orders and both are endless, then $A+B$ is endless; if at least one of them is endless, then $A\otimes B$ is endless.
  3. If $A$ is an endless dense linear order and $B$ is linear, then $A\otimes B$ is an endless dense linear order.
  4. If $A$ is an endless discrete linear order and $B$ is linear, then $A\otimes B$ is an endless discrete linear order.

Proof. If both $A$ and $B$ are linear orders, then it is clear that $A+B$ is linear. Any two points within the $A$ copy are comparable, and any two points within the $B$ copy, and every point in the $A$ copy is below any point in the $B$ copy. So any two points are comparable and thus we have a linear order. With the product $A\otimes B$, we have $B$ many copies of $A$, and this is linear since any two points within one copy of $A$ are comparable, and otherwise they come from different copies, which are then comparable since $B$ is linear. So $A\otimes B$ is linear.

For statement (2), we know that $A+B$ and $A\otimes B$ are nontrivial linear orders. If both $A$ and $B$ are endless, then clearly $A+B$ is endless, since every node in $A$ has something below it and every node in $B$ has something above it. For the product $A\otimes B$, if $A$ is endless, then every node in any copy of $A$ has nodes above and below it, and so this will be true in $A\otimes B$; and if $B$ is endless, then there will always be higher and lower copies of $A$ to consider, so again $A\otimes B$ is endless, as desired.

For statement (3), assume that $A$ is an endless dense linear and that $B$ is linear. We know from (1) that $A\otimes B$ is a linear order. Suppose that $x<y$ in this order. If $x$ and $y$ live in the same copy of $A$, then there is a node $z$ between them, because $A$ is dense. If $x$ occurs in one copy of $A$ and $B$ in another, then because $A$ is endless, there will a node $z$ above $x$ in its same copy, leading to $x<z<y$ as desired. (Note: we don’t need $B$ to be dense.)

For statement (4), assume instead that $A$ is an endless discrete linear order and $B$ is linear. We know that $A\otimes B$ is a linear order. Every node of $A\otimes B$ lives in a copy of $A$, where it has an immediate successor and an immediate predecessor, and these are also immediate successor and predecessor in $A\otimes B$. From this, it follows also that $A\otimes B$ is endless, and so it is an endless discrete linear order. $\Box$

The reader is encouraged to consider as an exercise whether one can drop the “endless” hypotheses in the theorem. Please answer in the comments.

Theorem. The endless discrete linear orders are exactly those of the form $\Z\otimes L$ for some linear order $L$.

Proof. If $L$ is a linear order, then $\Z\otimes L$ is an endless discrete linear order by the theorem above, statement (4). So any order of this form has the desired feature. Conversely, suppose that $\P$ is an endless discrete linear order. Define an equivalence relation for points in this order by which $p\sim q$ if and only $p$ and $q$ are at finite distance, in the sense that there are only finitely many points between them. This relation is easily seen to be reflexive, transitive and symmetric, and so it is an equivalence relation. Since $\P$ is an endless discrete linear order, every object in the order has an immediate successor and immediate predecessor, which remain $\sim$-equivalent, and from this it follows that the equivalence classes are each ordered like the integers $\Z$, as indicated by the figure here.

The equivalence classes amount to a partition of $\P$ into disjoint segments of order type $\Z$, as in the various colored sections of the figure. Let $L$ be the induced order on the equivalence classes. That is, the domain of $L$ consists of the equivalence classes $\P/\sim$, which are each a $\Z$ chain in the original order, and we say $[a]<_L[b]$ just in case $a<_{\P}b$. This is a linear order on the equivalence classes. And since $\P$ is $L$ copies of its equivalence classes, each of which is ordered like $\Z$, it follows that $\P$ is isomorphic to $\Z\otimes L$, as desired. $\Box$

(Interested readers are advised that the argument above uses the axiom of choice, since in order to assemble the isomorphism of $\P$ with $\Z\otimes L$, we need in effect to choose a center point for each equivalence class.)

If we consider the integers inside the rational order $\Z\of\Q$, it is clear that we can have a discrete suborder of a dense linear order. How about a dense suborder of a discrete linear order?

Question. Is there a discrete linear order with a suborder that is a dense linear order?

What? How could that happen? In my experience, mathematicians first coming to this topic often respond instinctively that this should be impossible. I have seen sophisticated mathematicians make such a pronouncement when I asked the audience about it in a public lecture. The fundamental nature of a discrete order, after all, is completely at odds with density, since in a discrete order, there is a next point up and down, and a next next point, and so on, and this is incompatible with density.

Yet, surprisingly, the answer is Yes! It is possible—there is a discrete order with a suborder that is densely ordered. Consider the extremely interesting order $\Z\otimes\Q$, which consists of $\Q$ many copies of $\Z$, laid out here increasing from left to right. Each tiny blue dot is a rational number, which has been replaced with an entire copy of the integers, as you can see in the magnified images at $a$, $b$, and $c$.

The order is quite subtle, and so let me also provide an alternative presentation of it. We have many copies of $\Z$, and those copies are densely ordered like $\Q$, so that between any two copies of $\Z$ is another one, like this:

Perhaps it helps to imagine that the copies of $\Z$ are getting smaller and smaller as you squeeze them in between the larger copies. But you can indeed always fit another copy of $\Z$ between, while leaving room for the further even tinier copies of $\Z$ to come.

The order $\Z\otimes\Q$ is discrete, in light of the theorem characterizing discrete linear orders. But also, this is clear, since every point of $\Z\otimes\Q$ lives in its local copy of $\Z$, and so has an immediate successor and predecessor there. Meanwhile, if we select exactly one point from each copy of $\Z$, the $0$ of each copy, say, then these points are ordered like $\Q$, which is dense. Thus, we have proved:

Theorem. The order $\Z\otimes\Q$ is a discrete linear order having a dense linear order as a suborder.

One might be curious now about the order $\Q\otimes\Z$, which is $\Z$ many copies of $\Q$. This order, however, is a countable endless dense linear order, and therefore is isomorphic to $\Q$ itself.


This material is adapted from my book-in-progress, Topics in Logic, drawn from Chapter 3 on Relational Logic, which incudes an extensive section on order theory, of which this is an important summative part.

Linear gradings of partial orders

Order relations are often fruitfully conceived as being stratified into “levels” in a natural way. The level structure is meant to be compatible with the order, in the sense that as one moves strictly up in the order, one also ascends successively from lower levels to strictly higher levels. With the simple order relation pictured here, for example, we might naturally imagine the least element $0$ on a bottom level, the three middle nodes $a$, $b$, and $c$ on an intermediate level, and node $1$ on a level at the top. With this level structure, as we move up strictly in the order, then we also move up strictly in the hierarchy of levels.

What exactly is a level? Let us be a little more precise. We can depict the intended level structure of our example order more definitely as in the figure here. At the left appears the original order relation, while the yellow highlighted bands organize the nodes into three levels, with node $0$ on the bottom level, nodes $a$, $b$, and $c$ on the middle level, and node $1$ on the top level. This level structure in effect describes a linear preorder relation $\leq$ for which $0\leq a,b,c\leq 1$, with the three intermediate nodes all equivalent with respect to this preorder—they are on the same level.

$\def\<#1>{\left\langle#1\right\rangle}\newcommand{\of}{\subseteq}\newcommand{\set}[1]{{\,{#1}\,}}\renewcommand\emptyset{\varnothing}$Thus, we realize that a level structure of an order relation is simply a linear preorder relation that respects strict instances of the original order, and the levels are simply the equivalence classes of the preorder. More precisely, we define that a linear grading of an order relation $\<A,\preccurlyeq>$ is a linear preorder relation $\leq$ on $A$ for which every strict instance of the original order is also graded strictly by the linear preorder; that is, we require that
$$a\prec b\quad\text{ implies }\quad a<b$$ Thus, any strictly lower point in the original order is on a lower level, and we define that objects are on the same level if they are equivalent with respect to the preorder. A linearly graded order is a relational structure $\<A,\preccurlyeq,\leq>$ with two orders on the same domain, the first $\preccurlyeq$ being an order relation on $A$ and the second $\leq$ being a linear preorder relation that grades the first order.

It turns out that there are often far more ways to stratify a given order by levels than one might have expected. For the simple order above, for example, there are thirteen distinct linear grading orders, as shown here.

The conclusion is inescapable that the level concept is not inherent in the order relation itself, for a given order relation may admit a huge variety of different level hierarchies, each of them compatible with the given order.

One should therefore not make the mistake of thinking that if one has an order relation, such as a philosophical reduction notion of some kind or a supervenience relation, then one is automatically entitled to speak of “levels” of the order. One might want to speak of “low-level” phenomena or “high-level” concepts in the order, but one must recognize that the order relation itself does not determine a specific hierarchy of levels, although it does place limitations on the possible stratifications. My point is that there is often surprising flexibility in the nature of the level structure, as the example above shows even in a very simple case, and so what counts as low or high in terms of levels may be much less determined than one expects. In some of the linear gradings above, for example, the node $a$ could be described as high-level, and in others, it is low-level. Therefore if one wants to speak of levels for one’s order, then one must provide further elucidation of the stratification one has in mind.

Meanwhile, we often can provide a natural level structure. In the power set $P(X)$ of a finite set $X$, ordered by the subset relation $A\of B$, for example, we can naturally stratify the relation by the sizes of the set, that is, in terms of the number of elements. Thus, we would place the $\emptyset$ on the bottom level, and the singleton sets $\{a\}$ on the next level, and then the doubletons $\{a,b\}$, and so on. This stratification by cardinality doesn’t quite work when $X$ is infinite, however, since there can be instances of strict inclusion $A\subsetneq B$ where $A$ and $B$ are both infinite and nevertheless equinumerous. Is there a level stratification of the infinite power set order?

Indeed there is, for every order relation admits a grading into levels.

Theorem. Every order relation $\<A,\preccurlyeq>$ can be linearly graded. Indeed, every order relation can be extended to a linear order (not merely a preorder), and so it can be graded into levels with exactly one node on each level.

Proof. Let us begin with the finite case, which we prove by induction. Assume $\preccurlyeq$ is an order relation on a finite set $A$. We seek to find a linear order $\leq$ on $A$ such that $x\preccurlyeq y\implies x\leq y$. If $A$ has at most one element, then we are done immediately, since $\preccurlyeq$ would itself already be linear.

Let us proceed by induction. Assume that every order of size $n$ has a linear grading, and that we have a partial order $\preccurlyeq$ on a set $A$ of size $n+1$. Every finite order has at least one maximal element, so let $a\in A$ be a $\preccurlyeq$-maximal element. If we consider the relation $\preccurlyeq$ on the remaining elements $A\setminus\{a\}$, it is a partial order of size $n$, and thus admits a linear grading order $\leq$. We can now simply place $a$ atop that order, and this will be a linear grading of $\<A,\preccurlyeq>$, because $a$ was maximal, and so making it also greatest in the grading order will cohere with the grading condition.

So by induction, every finite partial order relation can be extended to a linear order.

Now, we consider the general case. Suppose that $\preccurlyeq$ is a partial order relation on a (possibly infinite) set $A$. We construct a theory $T$ in a language with a new relation symbol $\leq$ and constant symbols $\dot a$ for every element $a\in A$. The theory $T$ should assert that $\leq$ is a linear order, that $\dot a\leq \dot b$, whenever it is actually true that $a\preccurlyeq b$ in the original order, and that $\dot a\neq\dot b$ whenever $a\neq b$. So the theory $T$ describes the situation that we want, namely, a linear order that conforms with the original partial order.

The key observation is that every finite subset of the theory will be satisfiable, since such a finite subtheory in effect reduces to the case of finite orders, which we handled above. That is, if we take only finitely many of the axioms of $T$, then it involves a finite partial order on the nodes that are mentioned, and by the finite case of the theorem, this order is refined by a linear order, which provides a model of the subtheory. So every finite subtheory of $T$ is satisfiable, and so by the compactness theorem, $T$ itself also is satisfiable.

Any model of $T$ provides a linear order $\leq$ on the constant symbols $\dot a$, which will be a linear order extending the original order relation, as desired. $\Box$

This material is adapted from my book-in-progress, Topics in Logic, which includes a chapter on relational logic, included an extended section on orders.

The lattice of sets of natural numbers is rich

$\newcommand\of{\subseteq}\newcommand\N{\mathbb{N}}\newcommand\R{\mathbb{R}}\def\<#1>{\left\langle#1\right\rangle}\newcommand\Z{\mathbb{Z}}\newcommand\Q{\mathbb{Q}}\newcommand\ltomega{{{<}\omega}}\newcommand\unaryminus{-}\newcommand\intersect{\cap}\newcommand\union{\cup}\renewcommand\emptyset{\varnothing}$Let us explore the vast and densely populated expanses of the lattice of all sets of natural numbers—the power set lattice $\<P(\N),\of>$.

We find in this lattice every possible set of natural numbers $A\of\N$, and we consider them with respect to the subset relation $A\of B$. This is a lattice order because the union of two sets $A\union B$ is their least upper bound with respect to inclusion and the intersection $A\intersect B$ is their greatest lower bound. Indeed, it is a distributive lattice in light of $A\intersect(B\union C)=(A\intersect B)\union(A\intersect C)$, but furthermore, as a power set algebra it is a Boolean algebra and every Boolean algebra is a distributive lattice.

The empty set $\emptyset$ is the global least element at the bottom of the lattice, of course, and the whole set $\N$ is the greatest element, at the top. The singletons $\set{n}$ are atoms, one step up from $\emptyset$, while their complements $\N\setminus\set{n}$ are coatoms, one step down from $\N$. One generally conceives of the finite sets as clustering near the Earthly bottom of the lattice, finitely close to $\emptyset$, whereas the cofinite sets soar above, finitely close to the heavenly top. In the vast central regions between, in contrast, we find the infinite-coinfinite sets, including many deeply interesting sets such as the prime numbers, the squares, the square-free numbers, the composites, and uncountably many more.

Question. Which familiar orders can we find as suborders in the power set lattice $P(\N)$ of all sets of natural numbers?

We can easily find a copy of the natural-number order $\<\N,\leq>$ in the power set lattice $P(\N)$, simply by considering the chain of initial segments:
$$\emptyset\of\set{0}\of\set{0,1}\of\set{0,1,2}\of\cdots$$
This sequence climbs like a ladder up the left side of the main figure. Curious readers will notice that although every one of these sets is finite and hence conceived as clinging to Earth, as we had said, nevertheless there is no upper bound for this sequence in the lattice other than the global maximum $\N$—any particular natural number is eventually subsumed. Thus one climbs to the heavens on this ladder, which stretches to the very top of the lattice, while remaining close to Earth all the while on every rung.

The complements of those sets similarly form a infinite descending sequence:
$$\cdots\ \of\ \N\setminus\set{0,1,2}\ \of\ \N\setminus\set{0,1}\ \of\ \N\setminus\set{0}\ \of\ \N$$
This chain of cofinite sets dangles like a knotted rope from the heavens down the right side of the main figure. Every knot on this rope is a cofinite set, which we conceived as very near the top, and yet the rope dangles temptingly down to Earth, for there is no lower bound except $\emptyset$.

Can we find a copy of the discrete integer order $\<\Z,\leq>$ in the power set lattice?

Yes, we can. An example is already visible in the red chain at the right side of main lattice diagram, a sky crane hanging in space. Standing at the set of odd numbers, we can ascend in discrete steps by adding additional even numbers one at a time; or we can descend similarly by removing successive odd numbers. The result is a discrete chain isomorphic to the integer order $\Z$, and it is unbounded except by the extreme points $\emptyset$ and $\N$.

Can we find a dense order, say, a copy of the rational line $\<\Q,\leq>$?

What? You want to find a dense order in the lattice of sets of natural numbers? That might seem impossible, if one thinks of the lattice as having a fundamentally discrete nature. The nearest neighbors of any set, after all, are obtained by adding or removing exactly one element—doing so makes a discrete step up or down with no strictly intermediate set. Does the essentially discrete nature of the lattice suggest that we cannot embed the dense order $\<\Q,\leq>$?

No! The surprising fact is that indeed we can embed the rational line into the power set lattice $P(\N)$. Indeed, much more is true—it turns out that we can embed a copy of any countable order whatsoever into the lattice $P(\N)$. Every countable order that occurs anywhere at all occurs in the lattice of sets of natural numbers.

Theorem. The power set lattice on the natural numbers $P(\N)$ is universal for all countable orders. That is, every order relation on a countable domain is isomorphic to a suborder of $\<P(\N),\of>$.

Proof. Let us prove first that every order relation $\<L,\preccurlyeq>$ is isomorphic to the subset relation $\of$ on a collection of sets. Namely, for each $q\in L$ let $q{\downarrow}$ be the down set of $q$, the set of predecessors of $q$:
$$q{\downarrow}=\set{p\in L\mid p\preccurlyeq q}\of L$$
This is a subset of $L$, and what I claim is that
$$p\preccurlyeq q\quad\text{ if and only if }\quad p{\downarrow}\of q{\downarrow}.$$
This is almost immediate. For the forward implication, if $p\preccurlyeq q$, then any order element below $p$ is also below $q$, and so $p{\downarrow}\of q{\downarrow}$. Conversely, if $p{\downarrow}\of q{\downarrow}$, then since $p\in p{\downarrow}$, we must have $p\in q{\downarrow}$ and hence $p\preccurlyeq q$ as desired. So the map $p\mapsto p{\downarrow}$ is an isomorphism of the original order $\<L,\preccurlyeq>$ with the family of all downsets $\set{p{\downarrow}\mid p\in L}$ using the subset relation.

To complete the proof, we simply observe that the power set lattice $P(L)$ of a countably infinite set $L$ is isomorphic to $P(\N)$, and for finite $L$ it is isomorphic to the power set lattice below a finite set of natural numbers. This is because any bijection of $L$ with a set of natural numbers induces a corresponding isomorphism of the respective subsets. So we may find a copy of $L$ inside $P(\N)$, as desired. $\Box$

One may make the proof more concrete by combining the two steps, like this: if $\<L,\preccurlyeq>$ is any countable order relation, then enumerate the domain set $L=\set{p_0,p_1,p_2,\ldots}$, and for each $p\in L$ let $A_p=\set{n\in\N\mid p_n\preccurlyeq p}\of\N$. This is like the down set of $p$, except we take the indices of the elements instead of the elements themselves. It follows that $p\preccurlyeq q$ if and only if $A_p\of A_q$, and so the mapping $p\mapsto A_p$ is an order isomorphism of $\<L,\preccurlyeq>$ with the collection of $A_p$ sets under $\of$, as desired.

Since the set of rational numbers $\Q$ is countable, it follows as an instance of the universality theorem above that we may find a copy of the rational line $\<\Q,\leq>$ inside the power set lattice $\<P(\N),\of>$. Can you find an elegant explicit presentation of rational line in the powerset lattice? In light of the characterization of $\Q$ as a countable endless dense linear order, it will suffice to find any chain in $P(\N)$ that forms a countable endless dense linear order.

But wait, there is more! We can actually find a copy of the real line $\<\R,\leq>$ in the power set lattice on the naturals! Wait…really? How is that possible? The reals are uncountable, and so this would be an uncountable chain the lattice of sets of natural numbers, but every subset $A\of\N$ is countable, and there are only countably many elements to add to $A$ as you go up. How could there be an uncountable chain in the lattice?

Let me explain.

Theorem. The power set lattice on the natural numbers $\<P(\N),\of>$ has an uncountable chain, and indeed, it has a copy of the real continuum $\<\R,\leq>$.

Proof. We have already observed that there must be a copy of the rational line $\<\Q,\leq>$ in the power set lattice $P(\N)$. In other words, there is a map $q\mapsto A_q\of\N$ such that
$$p<q\quad\text{ if and only if }\quad A_p\of A_q$$
for any rational numbers $p,q\in\Q$.

We may now find our desired copy of the real continuum simply by completing this embedding. Namely, for any real number $r\in\R$, let
$$A_r=\bigcup_{\substack{q\leq r\\ q\in\Q}} A_q.$$
Since the sets $A_q$ for rational numbers $q$ form a chain, it follows easily that $r\leq s$ implies $A_r\of A_s$. And if $r<s$, then $A_r$ will be strictly contained in some $A_q$ for a rational number $q$ strictly between $r<q<s$, and consequently strictly contained in $A_s$. From this, it follows that $r\leq s$ if and only if $A_r\of A_s$, and so we have found a copy of the real number order in the power set lattice, as desired. $\Box$

Let us now round out the surprising events of this section by proving another shocking fact—we can also find an uncountable antichain in the power set lattice $P(\N)$.

Theorem. The power set lattice on the natural numbers $\<P(\N),\of>$ has an uncountable antichain, a collection of continuum many pairwise incomparable sets.

Proof. Consider the tree of finite binary sequences $2^{\ltomega}$. This is the set of all finite binary sequences, ordered by the relation $s\leq t$ that holds when $t$ extends $s$ by possibly adding additional binary digits on the end. This is a countable order relation, and so by universality theorem above we may find a copy of it in the power set lattice $P(\N)$. But let us be a little more specific. We shall label each node $s$ of the tree $2^{\ltomega}$ with a distinct natural number. For any infinite binary path $x\in 2^\omega$ through the tree, let $A_x$ be the set of numbers that appear along the path. Every $A_x$ is infinite, but if $x\neq y$, then $x$ and $y$ will eventually branch away from each other, and consequently $A_x$ and $A_y$ will have only finitely many numbers in common. In particular, neither $A_x$ nor $A_y$ will be a subset of the other, and so we have found an antichain in the powerset lattice. The number of paths through the binary tree is the number of infinite binary sequences, which is the same as the cardinality of the continuum of real numbers, as desired. $\Box$

So the lattice $P(\N)$ offers a rich complexity. But is there also homogeneity to be found here? Place yourself at one of the sets in the central region of the lattice, an infinite coinfinite set. There are various new elements for you to add to make a larger set—you can add just this element, or just that one, or any of infinitely many new atoms for you to add to make a larger set. And you could add any combination of those atoms. Above any coinfinite point in the lattice, therefore, one finds a perfect copy of the whole lattice again. Looking upward from any coinfinite set, it always looks the same.

Theorem. The lattice of all sets of natural numbers is homogenous in several respects.

  1. The lattice of sets above any given coinfinite set $A\of\N$ is isomorphic to the whole power set lattice $P(\N)$.
  2. The lattice of sets below any given infinite set $B\of\N$ is isomorphic to the whole power set lattice $P(\N)$.
  3. For any two infinite coinfinite sets $A,B\of \N$, there is an automorphism of the lattice $P(\N)$ carrying $A$ to $B$.

Proof. Let us prove statement (2) first. Consider any infinite set $B\of\N$ and the sublattice in $P(\N)$ below it, which is precisely the power set lattice $P(B)$ of all subsets of $B$. Since $B$ is an infinite set of natural numbers, there is a bijection $\pi:B\to\N$. We may therefore transform subsets of $B$ to subsets of $\N$ by applying the bijection. Namely, for any $X\of B$ define $\bar\pi(X)=\pi[B]=\set{\pi(b)\mid b\in X}$. Since $\pi$ is injective, we have $X\of Y$ if and only if $\bar\pi(X)\of\bar\pi(Y)$, and the map $\bar\pi$ maps onto $P(\N)$ precisely because $\pi$ is surjective, so every $Z\of\N$ is $Z=\bar\pi(X)$, where $X=\pi^{\unaryminus 1}Z=\set{b\in B\mid \pi(b)\in Z}$. So $\bar\pi$ is an isomorphism of the lattice $P(B)$ with $P(\N)$, as desired.

For statement (1), consider any cofinite set $A\of\N$ and let $A{\uparrow}$ be the part of the power set lattice above $A$, meaning $A\uparrow=\set{X\of\N\mid A\of X}$. Since the complement $B=\N\setminus A$ is infinite, there is a bijection of it with the whole set of natural numbers $\pi:B\to\N$. I claim that the lattice $A{\uparrow}$ is isomorphic with the lattice $P(B)$, which by statement (1) we know is isomorphic to $P(\N)$. To see this, we simply cast $A$ out of any set $X$ containing $A$. What remains is a subset of $B$, and every subset of $B$ will be realized. More specifically, if $A\of X$ we define $\tau(X)=X\setminus A=X\intersect B$, and observe that $X\of Y$ if and only if $\tau(X)\of\tau(Y)$, for the sets already containing $A$. And if $Z\of B$, then $Z=\tau(A\union Z)$. So $\tau$ is an isomorphism of the lattice $A{\uparrow}$ with $P(B)$, which by statement (1) is isomorphic to the whole lattice $P(\N)$, as desired.

Finally, let us fix two infinite coinfinite sets $A,B\of \N$. In particular, $A$ and $B$ are in bijection, as are the complements $\N\setminus A$ and $\N\setminus B$. By combining these two bijections, we get a bijection $\pi:\N\to\N$ of $\N$ with itself that carries $A$ exactly to $B$, that is, for which $\pi[A]=B$. (The reader will have an exercise in my book to give a careful account of this construction.) Because $\pi$ is a bijection of $\N$ with itself, it induces a corresponding automorphism of the lattice, defined by $\bar\pi(X)=\pi[X]$ for $X\of\N$. This is an isomorphism of $P(\N)$ with itself, and $\bar\pi(A)=\pi[A]=B$, as desired. $\Box$

Statement (3) of the theorem shows that any two infinite coinfinite sets, being automorphic images of one another, play the same structural role in the lattice $P(\N)$—they will have all the same lattice-theoretic properties. In light of this theorem, therefore, one should not look for structure theorems concerning particular sets in the lattice. Rather, the structure theory of $P(\N)$ is about the structure as a whole, as with the theorems I have proved above.

This is a selection from my book in progress Topics in Logic. It appears in chapter 3, which is on relational logic, and which includes a section on orders with a subsection specifically on the lattice of all sets of natural numbers, because I find it fascinating.

What is the game of recursive chess?

Consider this fascinating vision of recursive chess — the etching below was created by Django Pinter, a tutorial student of mine who has just completed his degree in the PPL course here at Oxford, given to me as a parting gift at the conclusion of his studies. Django’s idea was to play chess, but in order for a capture to occur successfully on the board, as here with the black Queen attempting to capture the opposing white knight, the two pieces would themselves sit down for their own game of (recursive) chess. The idea was that the capture would be successful only in the event that the subgame was won. Notice in the image that not only is there a smaller recursive game of chess, but also a still tinier subrecursive game below that (if you inspect closely), while at the same time larger pieces loom in the background, indicating that the main board itself is already several levels deep in the recursion.

The recursive chess idea may seem clear enough initially, and intriguing. But with further reflection, we might wonder how does it work exactly? What precisely is the game of recursive chess? This is my question here.

My goal with this post is to open a discussion about what ultimately should be the precise the rules and operations of recursive chess. I’m not completely sure what the best rule set will be, although I do offer several proposals here. I welcome further proposals, commentary, suggestions, and criticism about how to proceed. Once we settle upon a best or most natural version of the game, then I shall hope perhaps to prove things about it. Will you help me decide what is the game of recursive chess?

Let me describe several natural proposals:

Naïve recursion. Although this seems initially to be a simple, sound proposal, ultimately I find it problematic. The naïve idea is that when one piece wants to capture another in the game at hand, then the two pieces themselves play a game of chess, starting from the usual chess starting position. I would find it natural that the attacking piece should play as white in this game, going first, and if that player wins the subgame, then the capture in the current game is successful. If the subgame is a loss, then the capture is unsuccessful.

There seem, however, to be a variety of ways to handle the losing subgame outcome, and since these will apply with several of the other proposals, let me record them here:

  • Failed-capture. If the subgame is lost, then the capture in the current game simply does not occur. Both pieces remain on their original squares, and the turn of play passes to the opponent. Notice that this will have serious affects in certain chess situations involving zugswang, a position where a player has no good move — because if one of them is a capture, then the player can aim to play badly in the subgame and thereby legally pass the turn of play to the opponent without having made any move. This situation will in effect cause the subgame players to attempt to lose, rather than win, which seems undesirable.
  • Failed-capture-with-penalty. If the subgame is lost, then the capture does not occur, but furthermore, the attacking piece is itself lost, removed from the board, and the turn of play passes to the opponent. In effect, under this rule, every attempt at capture is putting the life of the capturing piece at risk, which makes a certain amount of sense from a military point of view. I think perhaps this is a good rule.
  • Failed-capture-with-retry. If the subgame is lost, then the capture does not occur, but both pieces remain on their original squares, and the attacking player is allowed to proceed with another (different) move. Attempting the same attack from the same board position multiple times is subject to the three-fold repetition rule. This interpretation amounts in effect to the game play searching a large part of the game tree, exploring the possible capturing moves, but with the first successful option fixed as official. It invites manipulation by the opponent, who might play badly against a misguided capture attempt, causing it to be fixed as the official move.
  • Drawn subgame. A further complication arises from the fact that the subgame can itself be drawn, rather than won. Is this sufficient to cause the penalty or the retry? Or does this count as a failed capture?

As I see it, however, the principal problem with the naïve recursion rule is that it seems to be ill-founded. After all, we can have a game with a capture, which leads to a subgame with a capture, which leads to a deeper subgame with a capture, and so on descending forever. How is the outcome determined in this infinitely descending situation? None of the subgames is ever resolved with a definite conclusion until all of them are, and there seems no coherent way to assign resolutions to them. All infinitely many subgames are simply left hanging mid-play, and indeed mid-move. For this reason, the naïve recursion idea seems ultimately incoherent to me as a game rule.

What we would seem to need instead is a well-founded recursion, one which would ultimately bottom-out in a base case. With such a recursion, every outcome of the game would be well-defined. Such a well-founded recursion would be achieved, for example, if on every subgame there were strictly fewer pieces. Eventually, the subgames would reduce to king versus king, a drawn game, and then the drawn subgame rule would be invoked to whatever affect it cause. But the recursion would definitely terminate. And perhaps most recursions would terminate because the stronger player was ultimately mating in all his attacks, without requiring any invocation of the drawn subgame rule.

We can easily describe several natural subgame positions with one fewer piece. For example, when one piece attacks another, we may naturally consider the positions that would result if we performed the capture, or if we removed the attacking piece; and we might further consider swapping the roles of the players in these positions. Such cases would constitute a well-founded recursion, because the subgame position would have fewer pieces than the main position. In this way, we arrive at several natural recursion rules for recursive chess.

Proof-of-sufficiency recursion. The motivating idea of this recursion rule is that in order for an attack to be successful, the attacking player must prove that it will be sufficient for the attack to be successful. So, when piece A attacks piece B in the game, then a subgame is set up from the position that would result if A were successfully to capture B, and the players proceed in the game in which the attack has occurred. This is the same as proceeding in the main game, under the assumption that the attack was successful. If the attacking player can win this subgame, then it shows in a sense the sufficiency of the attack as a means to win, and so on the proof-of-sufficiency idea, we allow this as a reason for the attack to have been successful.

One might object that this recursion seems at first to amount to just playing chess as usual, since the subgame is the same as the original game with the attack having succeeded. But there is a subtle difference. For a misguided attack, the attacked player can play suboptimally in the subgame, intentionally losing that game, and then play differently in the main game. There is, of course, no obligation that the players respond the same at the higher-level games as in the lower games, and this is all part of their strategic calculation.

Proof-of-necessity recursion. The motivating idea of this recursion rule, in contrast, is that in order for an attack to be successful, the attacking player must prove that it is necessary that the attack take place. When piece A attacks piece B in the main game, then a subgame is set up in which the attack has not succeeded, but instead the attacking piece is lost, but the color sides of the players are swapped. If a black Queen attacks a white knight, for example, then in the subgame position, the queen is removed, and the players proceed from that positions, but with the opposite colors. By winning this subgame, the attacking player is in effect demonstrating that if the attack were to fail, then it would be devastating for them. In other words, they are demonstrating the necessity of the success of the attack.

For the both the proof-of-sufficiency and the proof-of-necessity versions of the recursion, it seems to me that any of the three failed-capture rules would be sensible. And so we have quite a few different interpretations here for what is the game of recursive chess.

What is your proposal? Please let me know in the comments. I am interested to hear any comments or criticism.

Plenitudinous Primes!

A proof of the infinitude of primes, in meter.

Let’s prove the primes’ infinitude
they do exist in multitude

of quite astounding magnitude
we count the primes in plenitude!

‘Twas proved by Euclid way back when
in Elements he argued then

He proved it with exactitude
in his Book IX he did conclude

for any given finite list
there will be primes the list has missed

to prove this gem let number N
be when you multiply them then

multiply the list for fun,
multiply, and then add one

If into N we shall divide
the listed primes seen in that guide

remainder one each leaves behind
so none as factors could we find

but every number has some prime factor
with our N take such an actor

since it can’t be on the list
we thus deduce more primes exist

No finite list can be complete
and so the claim is in retreat

By iterating this construction
primes do flow in vast deduction

as many as we’d like to see
so infinite the primes must be

Thus proved the primes’ infinitude
in quite enormous multitude

of arb’trarly great magnitude
we count the primes in plenitude!

‘Twas known by Euclid way back then
he proved it in the Elements when

in modern times we know some more
math’s never done, there’s new folklore:

For Bertrand, Chebyshev had said
about the puzzle in his head

and Erdős said it once again:
there’s always a prime between n and 2n.

We proved the primes’ infinitude
they do exist in multitude

of inconceiv’ble magnitude
we count the primes in plentitude!

The primes are plenitudinous
they’re truly multitudinous!


Stay tuned for a musical video production of Plenitudinous Primes by Hannah Hoffmancurrently in production. I can’t wait!

A gentle introduction to Boolean-valued model theory

This is an excerpt from my book-in-progress on diverse elementary topics in logic, from the chapter on model theory.  My view is that Boolean-valued models should be elevated to the status of a standard core topic of elementary model theory, adjacent to the ultrapower construction. The theory of Boolean-valued models is both beautiful and accessible, both generalizing and explaining the ultrapower method, while also partaking of deeper philosophical issues concerning multi-valued truth.

Boolean-valued models

Let us extend our familiar model concept from classical predicate logic to the comparatively fantastic realm of multi-valued predicate logic. We seek a multi-valued-truth concept of model, with a domain of individuals of some kind and interpretations for the relations, functions, and constants from a first-order language, but in which the truths of the model are not necessarily black and white, but exhibit their truth values in a given multi-valued logic. Let us focus especially on the case of Boolean-valued models, using a complete Boolean algebra $\newcommand\B{\mathbb{B}}\B$, since this case has been particularly well developed and successful; the set-theoretic method of forcing, for example, amounts to using $\B$-valued models of set theory with carefully chosen complete Boolean algebras $\B$ and has been used impressively to establish a sweeping set-theoretic independence phenomenon, showing that numerous set-theoretic principles such as the axiom of choice and the continuum hypothesis are independent of the other axioms of set theory.

The main idea of Boolean-valued models, however, is applicable with any theory, not just set theory, and there are Boolean-valued graphs, Boolean-valued orders, Boolean-valued groups, rings, and fields. So let us develop a little of this theory, keeping in mind that the basic construction and ideas also often work fruitfully with other multi-valued logics, not just Boolean algebras.

Definition of Boolean-valued models

We defined in an earlier section what it means to be a model in classical first-order predicate logic—one specifies the domain of the model, a set of individuals that will constitute the universe of the model over which all the quantifiers will range, and then provides interpretations on that domain for each relation symbol, function symbol, and constant symbol in the language. For each relation symbol $R$, for example, and any suitable tuple of individuals $a,b$ from the domain, one specifies whether $R(a,b)$ is to hold or fail in the model.

The main idea for defining what it means to be a model in multi-valued predicate logic is to replace these classical yes-or-no atomic truths with multi-valued atomic truth assertions. Specifically, for any Boolean algebra $\B$, a $\B$-valued model $\mathcal{M}$ in the language $\mathcal{L}$ consists of a domain $M$, whose elements are called names, and an assignment of $\B$-valued truth values for all the simple atomic formulas using parameters from that domain:$\newcommand\boolval[1]{[\![#1]\!]}$\begin{eqnarray*}
% \nonumber % Remove numbering (before each equation)
s=t&\mapsto& \boolval{s=t} \\
R(s_0,\ldots,s_n) &\mapsto& \boolval{R(s_0,\ldots,s_n)} \\
y=f(s_0,\ldots,s_n) &\mapsto& \boolval{y=f(s_0,\ldots,s_n)}
\end{eqnarray*}
The double-bracketed expressions here $\boolval{\varphi}$ denote an element of the Boolean algebra $\B$—one thinks of $\boolval{\varphi}$ as the $\B$-truth value of $\varphi$ in the model, and the model is defined by specifying these values for the simple atomic formulas.

We shall insist in any Boolean-valued model that the atomic truth values obey the laws of equality:
$$\begin{array}{rcl}
\boolval{s=s} &=& 1\\
\boolval{s=t} &=& \boolval{t=s}\\
\boolval{s=t}\wedge\boolval{t=u} &\leq& \boolval{s=u}\\
\bigwedge_{i<n}\boolval{s_i=t_i}\wedge\boolval{R(\vec s)} &\leq & \boolval{R(\vec t)}.\end{array}$$And if the language includes functions symbols, then we also insist on the functionality axioms:
$$\begin{array}{rcl}
\bigwedge_{i<n}\boolval{s_i=t_i}\wedge\boolval{y=f(\vec s)} &\leq& \boolval{y=f(\vec t)}\\
\bigvee_{t\in M}\boolval{t=f(\vec s)} &=& 1\\
\boolval{t_0=f(\vec s)}\wedge\boolval{t_1=f(\vec s)} &\leq& \boolval{t_0=t_1}.\\
\end{array}$$
These requirements assert respectively in the Boolean-valued context that the equality axiom holds for function application, that the function takes on some value, and that the function takes on at most one value. In effect, the Boolean-valued model treatment of functions regards them as as defined by their graph relations, and these functionality axioms ensure that the relation has the features that would be expected of such a graph relation.

In summary, a $\B$-valued model is a domain of names together with $\B$-valued truth assignments for the simple atomic assertions about those names, which obey the equality and functionality axioms.

Boolean-valued equality

The nature of equality in the Boolean-valued setting offers an interesting departure from the usual classical treatment that is worth discussing explicitly. Namely, in classical predicate logic with equality, distinct elements of the domain are taken to represent distinct individuals in the model—the assertion $a=b$ is true in the model only when $a$ and $b$ are identical as elements of the domain. In the Boolean-valued setting, however, we want to relax this in order to allow equality assertions to have an intermediate Boolean value. For this reason, distinct elements of the domain can no longer be taken to represent definitely distinct individuals, since the equality assertion $\boolval{a=b}$ might have some nonzero or intermediate degree of truth. The elements of the domain of a Boolean-valued model are allowed in effect a bit of indecision or indeterminacy about who they are and whether they might be identical. This is why we think of the elements of the domain as names for individuals, rather than as the individuals themselves. Two different names can have an intermediate truth-value possibility of naming the same or different individuals.

Extending to Boolean-valued truth

By definition, every $\B$-valued model provides $\B$-valued truth assertions for the simple atomic formulas. We now extend these Boolean-valued truth assignments to all assertions of the language through the following natural recursion:
$$\begin{array}{rcl}
\boolval{\varphi\wedge\psi} &=& \boolval{\varphi}\wedge\boolval{\psi}\\
\boolval{\neg\varphi} &=& \neg\boolval{\varphi}\\
\boolval{\exists x\,\varphi(x,\vec s)} &=& \bigvee_{t\in M}\boolval{\varphi(t,\vec s)},\text{ if this supremum exists in }\B\\
\end{array}$$
On the right-hand side, we make the logical calculations in each case using the operations of the Boolean algebra $\B$. In the existential case, the supremum may range over an infinite set of Boolean values, if the domain of the model is infinite, and so this supremum might not be realized as an element of $\B$, if it is not complete. This is precisely why one commonly considers $\B$-valued models especially in the case of a complete Boolean algebra $\B$—the completeness of $\B$ ensures that this supremum exists and so the recursion has a solution.

The reader may check by induction on $\varphi$ that the general equality axiom now has Boolean value one.
$$\boolval{\vec s=\vec t\wedge \varphi(\vec s)\implies\varphi(\vec t)}=1$$
The Boolean-valued structure $\mathcal{M}$ is full if for every formula $\varphi(x,\vec x)$ in $\mathcal{L}$ and $\vec s$ in $M$, there is some $t\in M$ such that $\boolval{\exists x\,\varphi(x,\vec s)}=\boolval{\varphi(t,\vec s)}$. That is, a model is full when it is rich enough with names that the Boolean value of every existential statement is realized by a particular witness, reducing the infinitary supremum in the recursive definition to a single largest element.

A simple example

For every natural number $n\geq 3$ let $C_n$ be the graph on $n$ vertices forming an $n$-cycle with edge relation $x\sim y$. So we have a triangle $C_3$, a square $C_4$, a pentagon $C_5$, and so on. Let $\B=P(\set{n\in\newcommand\N{\mathbb{N}}\N\mid n\geq 3})$ be the power set of these indices, which forms a complete Boolean algebra. We define a $\B$-valued graph model $\mathcal{C}$ as follows. We take as names the sequences $a=\langle a_n\rangle_{n\geq 3}$ with $a_n\in C_n$ for every $n$.

We define the equality and $\sim$ edge relations on these names as follows:
\begin{eqnarray*}
\boolval{a=b}&=&\set{n\mid a_n=b_n} \\
\boolval{a\sim b}&=&\set{n\mid C_n\newcommand\satisfies{\models}\satisfies a_n\sim b_n}.
\end{eqnarray*}
Thus, two names are equal exactly to extent that they have the same coordinates, and two names are connected by the edge relation $\sim$ exactly to the extent to that this is true on the coordinates. It is now easy to verify the equality axioms, since $a\sim b$ is true to at least the extent that $a’\sim b’$, $a=a’$, and $b=b’$ are true, since if $a_n=a’_n$, $b_n=b’_n$, and $a’_n\sim b’_n$ in $C_n$, then also $a_n\sim b_n$. So this is a $\B$-valued graph model.

So we have a Boolean-valued graph. Is it connected? Does that question make sense? Of course, we don’t have an actual graph here, but only a $\B$-valued graph, and so in principle we only know how to compute the Boolean value of statements that are expressible in the language of graph theory. Since connectivity is not formally expressible, except in the bounded finite cases, this question might not be sensible.

Let me argue that it is indeterminate whether this graph is a complete graph with every pair of distinct nodes connected by an edge. After all, $C_3$ is the complete graph on three vertices, and it will follow from the theorem below that the Boolean value of the statement $\forall x,y\,(x=y\vee x\sim y)$ contains the element $3$ and therefore this Boolean value is not zero. Meanwhile, the assertion that the graph is not complete, however, also gets a nonzero Boolean value, since every $C_n$ except $C_3$ has distinct nodes with no edge between them. In a robust sense, the graph-theoretic truths of $\mathcal{C}$ combine the truths of all the various graphs $C_n$.

Note also that as $n$ increases, the graphs $C_n$ have nodes that are increasingly far apart. Fix any name $\langle a\rangle_{n\geq 3}$ and choose $\langle b\rangle_{n\geq 3}$ such that the distance between $a_n$ and $b_n$ in $C_n$ is not bounded by any particular uniform bound. In $\mathcal{C}$, it follows that $a$ and $b$ have no particular definite finite distance, and this can be viewed as a sense in which $\mathcal{C}$ is not connected.

A combination of many models

Let us flesh out the previous example with a more general analysis. Suppose we have a family of models $\set{M_i\mid i\in I}$ in a common language $\mathcal{L}$. Let $\B=P(I)$ be the power set of the set of indices $I$, a complete Boolean algebra. We define a $\B$-valued model $\mathcal{M}$ as follows. The set of names will be precisely the product of the models $M=\prod_i M_i$, which is to say, the $I$-tuples $s=\langle s_i\mid i\in I\rangle$ with $s_i\in M_i$ for every $i\in I$, and the simple atomic truth values are defined like this:
\begin{eqnarray*}
\boolval{s=t}&=&\set{i\in I\mid s_i=t_i} \\
\boolval{R(s,\ldots,t)}&=&\set{i\in I\mid M_i\satisfies R(s_i,\dots,t_i)}\\
\boolval{u=f(s,\ldots,t)} &=& \set{i\in I\mid M_i\satisfies u_i=f(s_i,\dots,t_i)}.
\end{eqnarray*}
One can now prove the equality and functionality axioms, and so this is a $\B$-valued model.

Theorem. The Boolean-valued model $\mathcal{M}$ described above is full, and Boolean-valued truth can be computed coordinatewise:
$$\boolval{\varphi(s,\dots,t)}=\set{i\in I\mid M_i\satisfies\varphi(s_i,\dots,t_i)}.$$

Proof. We prove this by induction on $\varphi$, for assertions using only simple atomic formulas, $\wedge$, $\neg$, and $\exists$. The simple atomic case is part of what it means to be a Boolean-valued model. If the theorem is true for $\varphi$, then it will be true for $\neg\varphi$, since negation in $\B$ is complementation in $I$, as follows:
$$\boolval{\neg\varphi}=\neg\boolval{\varphi}=I-\set{i\in I\mid M_i\satisfies\varphi}=\set{i\in I\mid M_i\satisfies\neg\varphi}.$$
If the theorem is true for $\varphi$ and $\psi$, then it will be true for $\varphi\wedge\psi$ as follows:
\begin{eqnarray*}
\boolval{\varphi\wedge\psi} &=& \boolval{\varphi}\wedge\boolval{\psi} \\
&=& \set{i\in I\mid M_i\satisfies\varphi}\cap\set{i\in I\mid M_i\satisfies\psi}\\
&=& \set{i\in I\mid M_i\satisfies\varphi\wedge\psi}.
\end{eqnarray*}
For the quantifier case $\exists x\, \varphi(x,s,\dots,t)$, choose $u_i\in M_i$ for which $M_i\satisfies\varphi(u_i,s_i,\dots,t_i)$, if possible. It follows that
$$\set{i\in I\mid M_i\satisfies\exists x\,\varphi(x,s_i,\dots,t_i)}=\set{i\in I\mid M_i\satisfies\varphi(u_i,s_i,\dots,t_i)},$$
and from this it follows that $\boolval{\varphi(v,s,\dots,t)}\leq\boolval{\varphi(u,s,\dots,t)}$, since whenever $v_i$ succeeds as a witness then so also will $u_i$. Consequently, the model is full, and we see that
\begin{eqnarray*}
\boolval{\exists x\,\varphi(x,s,\dots,t)} &=& \bigvee_{v\in M}\boolval{\varphi(v,s,\dots,t)}\\
&=& \boolval{\varphi(u,s,\dots,t)}\\
&=& \set{i\in I\mid M_i\satisfies\exists x\,\varphi(x,s_i,\dots,t_i)},
\end{eqnarray*}
as desired. $\Box$

Boolean ultrapowers

Every Boolean-valued model can be transformed into a classical model, a $2$-valued model, by means of a simple quotient construction. Suppose that $\mathcal{M}$ is a $\B$-valued model for some Boolean algebra $\B$ and that $\mu\newcommand\of{\subseteq}\of\B$ is an ultrafilter on the Boolean algebra.

Define an equivalence relation $=_\mu$ on the set of names by
$$a=_\mu b\quad\text{ if and only if }\quad \boolval{a=b}\in\mu.$$
The quotient structure $\mathcal{M}/\mu$ will consist of equivalence classes $[a]_\mu$ of names by this equivalence relation. We define the atomic truths of the quotient structure similarly by reference to whether these truths hold with a value in $\mu$. Namely,
$$R^{\mathcal{M}/\mu}([a]_\mu,\ldots,[b]_\mu)\quad\text{ if and only if }\quad \boolval{R(a,\dots,b)}\in\mu.$$
For function symbols $f$, we define
$$[c]_\mu=f([a]_\mu,\dots,[b]_\mu)\quad\text{ if and only if }\quad \boolval{c=f(a,\dots,b)}\in\mu.$$
These definitions are well-defined modulo $=_\mu$ precisely because of the equality axiom properties of the Boolean-valued model $\mathcal{M}$. For example, if $a=_\mu a’$, then $\boolval{a=a’}\in\mu$, but $\boolval{a=a’}\wedge\boolval{R(a)}\leq \boolval{R(a’)}$ by the equality axiom, and so if $\boolval{R(a)}$ is in $\mu$, then so will be $\boolval{R(a’)}$.

We defined atomic truth in the quotient structure by reference to the truth value being $\mu$-large. In fact, this reduction will continue for all truth assertions in the quotient structure, which we prove as follows.

Theorem. (Łoś theorem for Boolean ultrapowers)
Suppose that $\mathcal{M}$ is a full $\B$-valued model for a Boolean algebra $\B$, and that $\mu\of\B$ is an ultrafilter. Then a statement is true in the Boolean quotient structure $\mathcal{M}/\mu$ if and only if the Boolean value of the statement is in $\mu$.
$$\mathcal{M}/\mu\satisfies\varphi([a]_\mu,\dots,[b]_\mu)\quad\text{ if and only if }\quad\boolval{\varphi(a,\dots,b)}\in\mu.$$

Proof. We prove this by induction on $\varphi$. The simple atomic case holds by the definition of the quotient structure. Boolean connectives $\wedge$, $\neg$ follow easily using that $\mu$ is a filter and that it is an ultrafilter. Consider the quantifier case $\exists x\, \varphi(a,\dots,b,x)$, where by induction the theorem holds for $\varphi$ itself. If the quotient structure satisfies the existential $\mathcal{M}/\mu\satisfies\exists x\,\varphi([a],\dots,[b],x)$, then there is a name $c$ for which it satisfies $\varphi([a],\dots,[b],[c])$, and so by induction $\boolval{\varphi(a,\dots,b,c)}\in\mu$, in which case also $\boolval{\exists x\,\varphi(a,\dots,b,x)}\in\mu$, since this latter Boolean value is at least as great as the former. Conversely, if $\boolval{\exists x\,\varphi(a,\dots,b,x)}\in\mu$, then by fullness this existential Boolean value is realized by some name $c$, and so $\boolval{\varphi(a,\dots,b,c)}\in\mu$, which by induction implies that the quotient satisfies $\varphi([a],\dots,[b],[c])$ and consequently also $\exists x\,\varphi([a],\dots,[b],x)$, as desired. $\Box$

Note how fullness was used in the existential case of the inductive argument.

Ode to Hippasus

I was so glad to be involved with this project of Hannah Hoffman. She had inquired on Twitter whether mathematicians could provide a proof of the irrationality of root two that rhymes. I set off immediately, of course, to answer the challenge. My wife Barbara Gail Montero and our daughter Hypatia and I spent a day thinking, writing, revising, rewriting, rethinking, rewriting, and eventually we had a set lyrics providing the proof, in rhyme and meter. We had wanted specifically to highlight not only the logic of the proof, but also to tell the fateful story of Hippasus, credited with the discovery.

Hannah proceeded to create the amazing musical version:

The diagonal of a square is incommensurable with its side
an astounding fact the Pythagoreans did hide

but Hippasus rebelled and spoke the truth
making his point with irrefutable proof

it’s absurd to suppose that the root of two
is rational, namely, p over q

square both sides and you will see
that twice q squared is the square of p

since p squared is even, then p is as well
now, if p as 2k you alternately spell

2q squared will to 4k squared equate
revealing, when halved, q’s even fate

thus, root two as fraction, p over q
must have numerator and denomerator with factors of two

to lowest terms, therefore, it can’t be reduced
root two is irrational, Hippasus deduced

as retribution for revealing this irrationality
Hippasus, it is said, was drowned in the sea

but his proof live on for the whole world to admire
a truth of elegance that will ever inspire.

A review of the Gödel fixed-point lemma, with generalizations and applications

This bried unpublished note (11 pages) contains an overview of the Gödel fixed-point lemma, along with several generalizations and applications, written for use in the Week 3 lecture of the Graduate Philosophy of Logic seminar that I co-taught with Volker Halbach at Oxford in Hilary term 2021. The theme of the seminar was self-reference, truth, and consistency strengths, and in this lecture we discussed the nature of Gödel’s fixed-point lemma and generalizations, with various applications in logic.

Contents

  1. Introduction
  2. Gödel’s fixed-point lemma
    An application to the Gödel incompleteness theorem
  3. Finite self-referential schemes
    An application to nonindependent disjunctions of independent sentences
  4. Gödel-Carnap fixed point lemma
    Deriving the double fixed-point lemma as a consequence
    An application to the provability version of Yablo’s paradox
  5. Kleene recursion theorem
    An application involving computable numbers
    An application involving the universal algorithm
    An application to Quine programs and Ouroborous chains
  6. References

Cantor’s Ice Cream Shoppe

Welcome to Cantor’s Ice Cream Shoppe! A huge choice of flavors—pile your cone high with as many scoops as you want!

Have two scoops, or three, four, or more! Why not infinitely many? Would you like $\omega$ many scoops, or $\omega\cdot2+5$ many scoops? You can have any countable ordinal number of scoops on your cone.

And furthermore, after ordering your scoops, you can order more scoops to be placed on top—all I ask is that you let me know how many such extra orders you plan to make. Let’s simply proceed transfinitely. You can announce any countable ordinal $\eta$, which will be the number of successive orders you will make; each order is a countable ordinal number of ice cream scoops to be placed on top of whatever cone is being assembled.

In fact, I’ll even let you change your mind about $\eta$ as we proceed, so as to give you more orders to make a taller cone.

So the process is:

  • You pick a countable ordinal $\eta$, which is the number of orders you will make.
  • For each order, you can pick any countable ordinal number of scoops to be added to the top of your ice-cream cone.
  • After making your order, you can freely increase $\eta$ to any larger countable ordinal, giving you the chance to make as many additional orders as you like.

At each limit stage of the ordering process, the ice cream cone you are assembling has all the scoops you’ve ordered so far, and we set the current $\eta$ value to the supremum of the values you had chosen so far.

If at any stage, you’ve used up your $\eta$ many orders, then the process has completed, and I serve you your ice cream cone. Enjoy!

Question. Can you arrange to achieve uncountably many scoops on your cone?

Although at each stage we place only countably many ice cream scoops onto the cone, nevertheless we can keep giving ourselves extra stages, as many as we want, simply by increasing $\eta$. Can you describe a systematic process of increasing the number of steps that will enable you to make uncountably many orders? This would achieve an unountable ice cream cone.

What is your solution? Give it some thought before proceeding. My solution appears below.

Alas, I claim that at Cantor’s Ice Cream Shoppe you cannot make an ice cream cone with uncountably many scoops. Specifically, I claim that there will inevitably come a countable ordinal stage at which you have used up all your orders.

Suppose that you begin by ordering $\beta_0$ many scoops, and setting a large value $\eta_0$ for the number of orders you will make. You subsequently order $\beta_1$ many additional scoops, and then $\beta_2$ many on top of that, and so on. At each stage, you may also have increased the value of $\eta_0$ to $\eta_1$ and then $\eta_2$ and so on. Probably all of these are enormous countable ordinals, making a huge ice cream cone.

At each stage $\alpha$, provided $\alpha<\eta_\alpha$, then you can make an order of $\beta_\alpha$ many scoops on top of your cone, and increase $\eta_\alpha$ to $\eta_{\alpha+1}$, if desired, or keep it the same.

At a limit stage $\lambda$, your cone has $\sum_{\alpha<\lambda}\beta_\alpha$ many scoops, and we update the $\eta$ value to the supremum of your earlier declarations $\eta_\lambda=\sup_{\alpha<\lambda}\eta_\alpha$.

What I claim now is that there will inevitably come a countable stage $\lambda$ for which $\lambda=\eta_\lambda$, meaning that you have used up all your orders with no possibility to further increase $\eta$. To see this, consider the sequence $$\eta_0\leq \eta_{\eta_0}\leq \eta_{\eta_{\eta_0}}\leq\cdots$$ We can define the sequence recursively by $\lambda_0=\eta_0$ and $\lambda_{n+1}=\eta_{\lambda_n}$. Let $\lambda=\sup_{n<\omega}\lambda_n$, the limit of this sequence. This is a countable supremum of countable ordinals and hence countable. But notice that $$\eta_\lambda=\sup_{n<\omega}\eta_{\lambda_n}=\sup_{n<\omega}\lambda_{n+1}=\lambda.$$ That is, $\eta_\lambda=\lambda$ itself, and so your orders have run out at $\lambda$, with no possibility to add more scoops or to increase $\eta$. So your order process completed at a countable stage, and you have therefore altogether only a countable ordinal number of scoops of ice cream. I’m truly very sorry at your pitiable impoverishment.

The otherwordly cardinals

I’d like to introduce and discuss the otherworldly cardinals, a large cardinal notion that frequently arises in set-theoretic analysis, but which until now doesn’t seem yet to have been given its own special name. So let us do so here.

I was put on to the topic by Jason Chen, a PhD student at UC Irvine working with Toby Meadows, who brought up the topic recently on Twitter:

In response, I had suggested the otherworldly terminology, a play on the fact that the two cardinals will both be worldly, and so we have in essence two closely related worlds, looking alike. We discussed the best way to implement the terminology and its extensions. The main idea is the following:

Main Definition. An ordinal $\kappa$ is otherworldly if $V_\kappa\prec V_\lambda$ for some ordinal $\lambda>\kappa$. In this case, we say that $\kappa$ is otherworldly to $\lambda$.

It is an interesting exercise to see that every otherworldly cardinal $\kappa$ is in fact also worldly, which means $V_\kappa\models\text{ZFC}$, and from this it follows that $\kappa$ is a strong limit cardinal and indeed a $\beth$-fixed point and even a $\beth$-hyperfixed point and more.

Theorem. Every otherworldly cardinal is also worldly.

Proof. Suppose that $\kappa$ is otherworldly, so that $V_\kappa\prec V_\lambda$ for some ordinal $\lambda>\kappa$. It follows that $\kappa$ must in fact be a cardinal, since otherwise it would be the order type of a relation on a set in $V_\kappa$, which would be isomorphic to an ordinal in $V_\lambda$ but not in $V_\kappa$. And since $\omega$ is not otherworldly, we see that $\kappa$ must be an uncountable cardinal. Since $V_\kappa$ is transitive, we get now easily that $V_\kappa$ satisfies extensionality, regularity, union, pairing, power set, separation and infinity. The only axiom remaining is replacement. If $\varphi(a,b)$ obeys a functional relation in $V_\kappa$ for all $a\in A$, where $A\in V_\kappa$, then $V_\lambda$ agrees with that, and also sees that the range is contained in $V_\kappa$, which is a set in $V_\lambda$. So $V_\kappa$ agrees that the range is a set. So $V_\kappa$ fulfills the replacement axiom. $\Box$

Corollary. A cardinal is otherworldly if and only if it is fully correct in a worldly cardinal.

Proof. Once you know that otherworldly cardinals are worldly, this amounts to a restatement of the definition. If $V_\kappa\prec V_\lambda$, then $\lambda$ is worldly, and $V_\kappa$ is correct in $V_\lambda$. $\Box$

Let me prove next that whenever you have an otherworldly cardinal, then you will also have a lot of worldly cardinals, not just these two.

Theorem. Every otherworldly cardinal $\kappa$ is a limit of worldly cardinals. What is more, every otherworldly cardinal is a limit of worldly cardinals having exactly the same first-order theory as $V_\kappa$, and indeed, the same $\alpha$-order theory for any particular $\alpha<\kappa$.

Proof. If $V_\kappa\prec V_\lambda$, then $V_\lambda$ can see that $\kappa$ is worldly and has the theory $T$ that it does. So $V_\lambda$ thinks, about $T$, that there is a cardinal whose rank initial segment has theory $T$. Thus, $V_\kappa$ also thinks this. And we can find arbitrarily large $\delta$ up to $\kappa$ such that $V_\delta$ has this same theory. This argument works whether one uses the first-order theory, or the second-order theory or indeed the $\alpha$-order theory for any $\alpha<\kappa$. $\Box$

Theorem. If $\kappa$ is otherworldly, then for every ordinal $\alpha<\kappa$ and natural number $n$, there is a cardinal $\delta<\kappa$ with $V_\delta\prec_{\Sigma_n}V_\kappa$ and the $\alpha$-order theory of $V_\delta$ is the same as $V_\kappa$.

Proof. One can do the same as above, since $V_\lambda$ can see that $V_\kappa$ has the $\alpha$-order theory that it does, while also agreeing on $\Sigma_n$ truth with $V_\lambda$, so $V_\kappa$ will agree that there should be such a cardinal $\delta<\kappa$. $\Box$

Definition. We say that a cardinal is totally otherworldly, if it is otherworldly to arbitrarily large ordinals. It is otherworldly beyond $\theta$, if it is otherworldly to some ordinal larger than $\theta$. It is otherworldly up to $\delta$, if it is otherworldly to ordinals cofinal in $\delta$.

Theorem. Every inaccessible cardinal $\delta$ is a limit of otherworldly cardinals that are each otherworldly up to and to $\delta$.

Proof. If $\delta$ is inaccessible, then a simple Löwenheim-Skolem construction shows that $V_\kappa$ is the union of a continuous elementary chain $$V_{\kappa_0}\prec V_{\kappa_1}\prec\cdots\prec V_{\kappa_\alpha}\prec \cdots \prec V_\kappa$$ Each of the cardinals $\kappa_\alpha$ arising on this chain is otherworldly up to and to $\delta$. $\Box$

Theorem. Every totally otherworldly cardinal is $\Sigma_2$ correct, meaning $V_\kappa\prec_{\Sigma_2} V$. Consequently, every totally otherworldly cardinal is larger than the least measurable cardinal, if it exists, and larger than the least superstrong cardinal, if it exists, and larger than the least huge cardinal, if it exists.

Proof. Every $\Sigma_2$ assertion is locally verifiable in the $V_\alpha$ hierarchy, in that it is equivalent to an assertion of the form $\exists\eta V_\eta\models\psi$ (for more information, see my post about Local properties in set theory). Thus, every true $\Sigma_2$ assertion is revealed inside any sufficiently large $V_\lambda$, and so if $V_\kappa\prec V_\lambda$ for arbitrarily large $\lambda$, then $V_\kappa$ will agree on those truths. $\Box$

I was a little confused at first about how two totally otherwordly cardinals interact, but now everything is clear with this next result. (Thanks to Hanul Jeon for his helpful comment below.)

Theorem. If $\kappa<\delta$ are both totally otherworldly, then $\kappa$ is otherworldly up to $\delta$, and hence totally otherworldly in $V_\delta$.

Proof. Since $\delta$ is totally otherworldly, it is $\Sigma_2$ correct. Since for every $\alpha<\delta$ the cardinal $\kappa$ is otherworldly beyond $\alpha$, meaning $V_\kappa\prec V_\lambda$ for some $\lambda>\alpha$, then since this is a $\Sigma_2$ feature of $\kappa$, it must already be true inside $V_\delta$. So such a $\lambda$ can be found below $\delta$, and so $\kappa$ is otherworldly up to $\delta$. $\Box$

Theorem. If $\kappa$ is totally otherworldly, then $\kappa$ is a limit of otherworldly cardinals, and indeed, a limit of otherworldly cardinals having the same theory as $V_\kappa$.

Proof. Assume $\kappa$ is totally otherworldly, let $T$ be the theory of $V_\kappa$, and consider any $\alpha<\kappa$. Since there is an otherworldly cardinal above $\alpha$ with theory $T$, namely $\kappa$, and because this is a $\Sigma_2$ fact about $\alpha$ and $T$, it follows that there must be such a cardinal above $\alpha$ inside $V_\kappa$. So $\kappa$ is a limit of otherworldly cardinals with the same theory as $V_\kappa$. $\Box$

The results above show that the consistency strength of the hypotheses are ordered as follows, with strict increases in consistency strength as you go up (assuming consistency):

  • ZFC + there is an inaccessible cardinal
  • ZFC + there is a proper class of totally otherworldly cardinals
  • ZFC + there is a totally otherworldly cardinal
  • ZFC + there is a proper class of otherworldly cardinals
  • ZFC + there is an otherworldly cardinal
  • ZFC + there is a proper class of worldly cardinals
  • ZFC + there is a worldly cardinal
  • ZFC + there is a transitive model of ZFC
  • ZFC + Con(ZFC)
  • ZFC

We might consider the natural strengthenings of otherworldliness, where one wants $V_\kappa\prec V_\lambda$ where $\lambda$ is itself otherworldly. That is, $\kappa$ is the beginning of an elementary chain of three models, not just two. This is different from having merely that $V_\kappa\prec V_\lambda$ and $V_\kappa\prec V_\eta$ for some $\eta>\lambda$, because perhaps $V_\lambda$ is not elementary in $V_\eta$, even though $V_\kappa$ is. Extending successively is a more demanding requirement.

One then naturally wants longer and longer chains, and ultimately we find ourselves considering various notions of rank in the rank elementary forest, which is the relation $\kappa\preceq\lambda\iff V_\kappa\prec V_\lambda$. The otherworldly cardinals are simply the non-maximal nodes in this order, while it will be interesting to consider the nodes that can be extended to longer elementary chains.

The real numbers are not interpretable in the complex field

Consider the real numbers $\newcommand\R{\mathbb{R}}\R$ and the complex numbers $\newcommand\C{\mathbb{C}}\C$ and the question of whether these structures are interpretable in one another as fields.

What does it mean to interpret one mathematical structure in another? It means to provide a definable copy of the first structure in the second, by providing a definable domain of $k$-tuples (not necessarily just a domain of points) and definable interpretations of the atomic operations and relations, as well as a definable equivalence relation, a congruence with respect to the operations and relations, such that the first structure is isomorphic to the quotient of this definable structure by that equivalent relation. All these definitions should be expressible in the language of the host structure.

One may proceed recursively to translate any assertion in the language of the interpreted structure into the language of the host structure, thereby enabling a complete discussion of the first structure purely in the language of the second.

For an example, we can define a copy of the integer ring $\langle\mathbb{Z},+,\cdot\rangle$ inside the semi-ring of natural numbers $\langle\mathbb{N},+,\cdot\rangle$ by considering every integer as the equivalence class of a pair of natural numbers $(n,m)$ under the same-difference relation, by which $$(n,m)\equiv(u,v)\iff n-m=u-v\iff n+v=u+m.$$ Integer addition and multiplication can be defined on these pairs, well-defined with respect to same difference, and so we have interpreted the integers in the natural numbers.

Similarly, the rational field $\newcommand\Q{\mathbb{Q}}\Q$ can be interpreted in the integers as the quotient field, whose elements can be thought of as integer pairs $(p,q)$ written more conveniently as fractions $\frac pq$, where $q\neq 0$, considered under the same-ratio relation
$$\frac pq\equiv\frac rs\qquad\iff\qquad ps=rq.$$
The field structure is now easy to define on these pairs by the familiar fractional arithmetic, which is well-defined with respect to that equivalence. Thus, we have provided a definable copy of the rational numbers inside the integers, an interpretation of $\Q$ in $\newcommand\Z{\mathbb{Z}}\Z$.

The complex field $\C$ is of course interpretable in the real field $\R$ by considering the complex number $a+bi$ as represented by the real number pair $(a,b)$, and defining the operations on these pairs in a way that obeys the expected complex arithmetic. $$(a,b)+(c,d) =(a+c,b+d)$$ $$(a,b)\cdot(c,d)=(ac-bd,ad+bc)$$ Thus, we interpret the complex number field $\C$ inside the real field $\R$.

Question. What about an interpretation in the converse direction? Can we interpret $\R$ in $\C$?

Although of course the real numbers can be viewed as a subfield of the complex numbers $$\R\subset\C,$$this by itself doesn’t constitute an interpretation, unless the submodel is definable. And in fact, $\R$ is not a definable subset of $\C$. There is no purely field-theoretic property $\varphi(x)$, expressible in the language of fields, that holds in $\C$ of all and only the real numbers $x$. But more: not only is $\R$ not definable in $\C$ as a subfield, we cannot even define a copy of $\R$ in $\C$ in the language of fields. We cannot interpret $\R$ in $\C$ in the language of fields.

Theorem. As fields, the real numbers $\R$ are not interpretable in the complex numbers $\C$.

We can of course interpret the real numbers $\R$ in a structure slightly expanding $\C$ beyond its field structure. For example, if we consider not merely $\langle\C,+,\cdot\rangle$ but add the conjugation operation $\langle\C,+,\cdot,z\mapsto\bar z\rangle$, then we can identify the reals as the fixed-points of conjugation $z=\bar z$. Or if we add the real-part or imaginary-part operators, making the coordinate structure of the complex plane available, then we can of course define the real numbers in $\C$ as those complex numbers with no imaginary part. The point of the theorem is that in the pure language of fields, we cannot define the real subfield nor can we even define a copy of the real numbers in $\C$ as any kind of definable quotient structure.

The theorem is well-known to model theorists, a standard observation, and model theorists often like to prove it using some sophisticated methods, such as stability theory. The main issue from that point of view is that the order in the real numbers is definable from the real field structure, but the theory of algebraically closed fields is too stable to allow it to define an order like that.

But I would like to give a comparatively elementary proof of the theorem, which doesn’t require knowledge of stability theory. After a conversation this past weekend with Jonathan Pila, Boris Zilber and Alex Wilkie over lunch and coffee breaks at the Robin Gandy conference, here is such an elementary proof, based only on knowledge concerning the enormous number of automorphisms of $\C$, a consequence of the categoricity of the complex field, which itself follows from the fact that algebraically closed fields of a given characteristic are determined by their transcedence degree over their prime subfield. It follows that any two transcendental elements of $\C$ are automorphic images of one another, and indeed, for any element $z\in\C$ any two complex numbers transcendental over $\Q(z)$ are automorphic in $\C$ by an automorphism fixing $z$.

Proof of the theorem. Suppose that we could interpret the real field $\R$ inside the complex field $\C$. So we would define a domain of $k$-tuples $R\subseteq\C^k$ with an equivalence relation $\simeq$ on it, and operations of addition and multiplication on the equivalence classes, such that the real field was isomorphic to the resulting quotient structure $R/\simeq$. There is absolutely no requirement that this structure is a submodel of $\C$ in any sense, although that would of course be allowed if possible. The $+$ and $\times$ of the definable copy of $\R$ in $\C$ might be totally strange new operations defined on those equivalence classes. The definitions altogether may involve finitely many parameters $\vec p=(p_1,\ldots,p_n)$, which we now fix.

As we mentioned, the complex number field $\C$ has an enormous number of automorphisms, and indeed, any two $k$-tuples $\vec x$ and $\vec y$ that exhibit the same algebraic equations over $\Q(\vec p)$ will be automorphic by an automorphism fixing $\vec p$. In particular, this means that there are only countably many isomorphism orbits of the $k$-tuples of $\C$. Since there are uncountably many real numbers, this means that there must be two $\simeq$-inequivalent $k$-tuples in the domain $R$ that are automorphic images in $\C$, by an automorphism $\pi:\C\to\C$ fixing the parameters $\vec p$. Since $\pi$ fixes the parameters of the definition, it will take $R$ to $R$ and it will respect the equivalence relation and the definition of the addition and multiplication on $R/\simeq$. Therefore, $\pi$ will induce an automorphism of the real field $\R$, which will be nontrivial precisely because $\pi$ took an element of one $\simeq$-equivalence class to another.

The proof is now completed by the observation that the real field $\langle\R,+,\cdot\rangle$ is rigid; it has no nontrivial automorphisms. This is because the order is definable (the positive numbers are precisely the nonzero squares) and the individual rational numbers must be fixed by any automorphism and then every real number is determined by its cut in the rationals. So there can be no nontrivial automorphism of $\R$, and we have a contradiction. So $\R$ is not interpretable in $\C$. $\Box$

Introducing modal model theory

Let me introduce to you the topic of modal model theory, injecting some ideas from modal logic into the traditional subject of model theory in mathematical logic.

For example, we may consider the class of all models of some first-order theory, such as the class of all graphs, or the class of all groups, or all fields or what have you. In general, we have $\newcommand\Mod{\text{Mod}}\Mod(T)$, where $T$ is a first-order theory in some language $L$.

We may consider $\Mod(T)$ as a potentialist system, a Kripke model of possible worlds, where each model accesses the larger models, of which it is a submodel. So $\newcommand\possible{\Diamond}\possible\varphi$ is true at a model $M$, if there is a larger model $N$ in which $\varphi$ holds, and $\newcommand\necessary{\Box}\necessary\varphi$ is true at $M$, if $\varphi$ holds in all larger models.

In this way, we enlarge the language $L$ to include these modal operators. Let $\possible(L)$ be the language obtained by closing $L$ under the modal operators and Boolean connectives; and let $L^\possible$ also close under quantification. The difference is whether a modal operator falls under the scope of a quantifier.

Recently, in a collaborative project with Wojciech Aleksander Wołoszyn, we made some progress, which I’d like to explain. (We also have many further results, concerning the potentialist validities of various natural instances of $\Mod(T)$, but those will wait for another post.)

Theorem. If models $M$ and $N$ are elementarily equivalent, that is, if they have the same theory in the language of $L$, then they also have the same theory in the modal language $\possible(L)$.

Proof. We show that whenever $M\equiv N$ in the language of $L$, then $M\models\varphi\iff N\models\varphi$ for sentences $\varphi$ in the modal language $\possible(L)$, by induction on $\varphi$.

Of course, by assumption the statement is true for sentences $\varphi$ in the base language $L$. And the property is clearly preserved by Boolean combinations. What remains is the modal case. Suppose that $M\equiv N$ and $M\models\possible\varphi$. So there is some extension model $M\subset W\models\varphi$.

Since $M\equiv N$, it follows by the Keisler-Shelah theorem that $M$ and $N$ have isomorphic ultrapowers $\prod_\mu M\cong\prod_\mu N$, for some ultrafilter $\mu$. It is easy to see that isomorphic structures satisfy exactly the same modal assertions in the class of all models of a theory. Since $M\subset W$, it follows that the ultrapower of $M$ is extended to (a copy of) the ultrapower of $W$, and so $\prod_\mu M\models\possible\varphi$, and therefore also $\prod_\mu N\models\possible\varphi$. From this, since $N$ embeds into its ultrapower $\prod_\mu N$, it follows also that $N\models\possible\varphi$, as desired. $\Box$

Corollary. If one model elementarily embeds into another $M\prec N$, in the language $L$ of these structures, then this embedding is also elementary in the language $\possible(L)$.

Proof. To say $M\prec N$ in language $L$ is the same as saying that $M\equiv N$ in the language $L_M$, where we have added constants for every element of $M$, and interpreted these constants in $N$ via the embedding. Thus, by the theorem, it follows that $M\equiv N$ in the language $\possible(L_M)$, as desired. $\Box$

For example, every model $M$ is elementarily embedding into its ultrapowers $\prod_\mu M$, in the language $\possible(L)$.

We’d like to point out next that these results do not extend to elementary equivalence in the full modal language $L^\possible$.

For a counterexample, let’s work in the class of all simple graphs, in the language with a binary predicate for the edge relation. (We’ll have no parallel edges, and no self-edges.) So the accessibility relation here is the induced subgraph relation.

Lemma. The 2-colorability of a graph is expressible in $\possible(L)$. Similarly for $k$-colorability for any finite $k$.

Proof. A graph is 2-colorable if we can partition its vertices into two sets, such that a vertex is in one set if and only if all its neighbors are in the other set. This can be effectively coded by adding two new vertices, call them red and blue, such that every node (other than red and blue) is connected to exactly one of these two points, and a vertex is connected to red if and only if all its neighbors are connected to blue, and vice versa. If the graph is $2$-colorable, then there is an extension realizing this statement, and if there is an extension realizing the statement, then (even if more than two points were added) the original graph must be $2$-colorable. $\Box$

A slightly more refined observation is that for any vertex $x$ in a graph, we can express the assertion, “the component of $x$ is $2$-colorable” by a formula in the language $\possible(L)$. We simply make the same kind of assertion, but drop the requirement that every node gets a color, and insist only that $x$ gets a color and the coloring extends from a node to any neighbor of the node, thereby ensuring the full connected component will be colored.

Theorem. There are two graphs that are elementary equivalent in the language $L$ of graph theory, and hence also in the language $\possible(L)$, but they are not elementarily equivalent in the full modal language $L^\possible$.

Proof. Let $M$ be a graph consisting of disjoint copies of a 3-cycle, a 5-cycle, a 7-cycle, and so on, with one copy of every odd-length cycle. Let $M^*$ be an ultrapower of $M$ by a nonprincipal ultrafilter.

Thus, $M^*$ will continue to have one 3-cycle, one 5-cycle, one 7-cycle and on on, for all the finite odd-length cycles, but then $M^*$ will have what it thinks are non-standard odd-length cycles, except that it cannot formulate the concept of “odd”. What it actually has are a bunch of $\mathbb{Z}$-chains.

In particular, $M^*$ thinks that there is an $x$ whose component is $2$-colorable, since a $\mathbb{Z}$-chain is $2$-colorable.

But $M$ does not think that there is an $x$ whose component is $2$-colorable, because an odd-length finite cycle is not $2$-colorable. $\Box$.

Since we used an ultrapower, the same example also shows that the corollary above does not generalize to the full modal language. That is, we have $M$ embedding elementarily into its ultrapower $M^*$, but it is not elementary in the language $L^\possible$.

Let us finally notice that the Łoś theorem for ultraproducts fails even in the weaker modal language $\possible(L)$.

Theorem. There are models $M_i$ for $i\in\mathbb{N}$ and a sentence $\varphi$ in the language of these models, such that every nonprincipal ultraproduct $\prod_\mu M_i$ satisfies $\possible\varphi$, but no $M_i$ satisfies $\possible\varphi$. .

Proof. In the class of all graphs, using the language of graph theory, let the $M_i$ be all the odd-length cycles. The ultraproduct $\prod_\mu M_i$ consists entirely of $\mathbb{Z}$-chains. In particular, the ultraproduct graph is $2$-colorable, but none of the $M_i$ are $2$-colorable. $\Box$

The connect-infinity game!

I saw the following image on Twitter and Reddit, an image suggesting an entire class of infinitary analogues of the game Connect-Four. What fun! Let’s figure it out!

I’m not sure to whom the image or the idea is due. Please comment if you have information. (See comments below for current information.)

The rules will naturally generalize those in Connect-Four. Namely, starting from an empty board, the players take turns placing their coins into the $\omega\times 4$ grid. When a coin is placed in a column, it falls down to occupy the lowest available cell. Let us assume for now that the game proceeds for $\omega$ many moves, whether or not the board becomes full, and the goal is to make a connected sequence in a row of $\omega$ many coins of your color (you don’t have to fill the whole row, but rather a connected infinite segment of it suffices). A draw occurs when neither or both players achieve their goals.

In the $\omega\times 6$ version of the game that is shown, and indeed in the $\omega\times n$ version for any finite $n$, I claim that neither player can force a win; both players have drawing strategies.

Theorem. In the game Connect-$\omega$ on a board of size $\omega\times n$, where $n$ is finite, neither player has a winning strategy; both players have drawing strategies.

Proof. For a concrete way to see this, observe that either player can ensure that there are infinitely many of their coins on the bottom row: they simply place a coin into some far-out empty column. This blocks a win for the opponent on the bottom row. Next, observe that neither player can afford to follow the strategy of always answering those moves on top, since this would lead to a draw, with a mostly empty board. Thus, it must happen that infinitely often we are able to place a coin onto the second row. This blocks a win for the opponent on the second row. And so on. In this way, either players can achieve infinitely many of their coins on each row, thereby blocking any row as a win for their opponent. So both players have drawing strategies. $\Box$

Let me point out that on a board of size $\omega\times n$, where $n$ is odd, we can also make this conclusion by a strategy-stealing argument. Specifically, I argue first that the first player can have no winning strategy. Suppose $\sigma$ is a winning strategy for the first player on the $\omega\times n$ board, with $n$ odd, and let us describe a strategy for the second player. After the first move, the second player mentally ignores a finite left-initial segment of the playing board, which includes that first move and with a odd number of cells altogether in it (and hence an even number of empty cells remaining); the second player will now aim to win on the now-empty right-side of the board, by playing as though playing first in a new game, using strategy $\sigma$. If the first player should ever happen to play on the ignored left side of the board, then the second player can answer somewhere there (it doesn’t matter where). In this way, the second player plays with $\sigma$ as though he is the first player, and so $\sigma$ cannot be winning for the first player, since in this way the second player would win in this stolen manner.

Similarly, let us argue by strategy-stealing that the second player cannot have a winning strategy on the board $\omega\times n$ for odd finite $n$. Suppose that $\tau$ is a winning strategy for the second player on such a board. Let the first player always play at first in the left-most column. Because $n$ is odd, the second player will eventually have to play first in the second or later columns, leaving an even number of empty cells in the first column (perhaps zero). At this point, the first player can play as though he was the second player on the right-side board containing only that fresh move. If the opponent plays again to the left, then our player can also play in that region (since there were an even number of empty cells). Thus, the first player can steal the strategy $\tau$, and so it cannot be winning for the second player.

I am unsure about how to implement the strategy stealing arguments when $n$ is even. I shall give this more thought. In any case, the theorem for this case was already proved directly by the initial concrete argument, and in this sense we do not actually need the strategy stealing argument for this case.

Meanwhile, it is natural also to consider the $n\times\omega$ version of the game, which has only finitely many columns, each infinite. The players aim to get a sequence of $\omega$-many coins in a column. This is clearly impossible, as the opponent can prevent a win by always playing atop the most recent move. Thus:

Theorem. In the game Connect-$\omega$ on a board of size $n\times\omega$, where $n$ is finite, neither player has a winning strategy; both players have drawing strategies.

Perhaps the most natural variation of the game, however, occurs with a board of size $\omega\times\omega$. In this version, like the original Connect-Four, a player can win by either making a connected row of $\omega$ many coins, or a connected column or a connected diagonal of $\omega$ many coins. Note that we orient the $\omega$ size column upwards, so that there is no top cell, but rather, one plays by selecting a not-yet-filled column and then occupying the lowest available cell in that column.

Theorem. In the game Connect-$\omega$ on a board of size $\omega\times\omega$, neither player has a winning strategy. Both players have drawing strategies.

Proof. Consider the strategy-stealing arguments. If the first player has a winning strategy $\sigma$, then let us describe a strategy for the second player. After the first move, the second player will ignore finitely many columns at the left, including that first actual move, aiming to play on the empty right-side of the board as though the first player using stolen strategy $\sigma$ (but with colors swapped). This will work fine, as long as the first player also plays on that part of the board. Whenever the first player plays on the ignored left-most part, simply respond by playing atop. This prevents a win in that part of the part, and so the second player will win on the right-side by pretending to be first there. So there can be no such winning strategy $\sigma$ for the first player.

If the second player has a winning strategy $\tau$, then as before let the first player always play in the first column, until $\tau$ directs the second player to play in another column, which must eventually happen if $\tau$ is winning. At that moment, the first player can pretend to be second on the subboard omitting the first column. So $\tau$ cannot have been winning after all for the second player. $\Box$

In the analysis above, I was considering the game that proceeded in time $\omega$, with $\omega$ many moves. But such a play of the game may not actually have filled the board completely. So it is natural to consider a version of the game where the players continue to play transfinitely, if the board is not yet full.

So let us consider now the transfinite-play version of the game, where play proceeds transfinitely through the ordinals, until either the board is filled or one of the players has achieved the winning goal. Let us assume that the first player also plays first at limit stages, at $\omega$ and $\omega\cdot 2$ and $\omega^2$, and so on, if game play should happen to proceed for that long.

The concrete arguments that I gave above continue to work for the transfinite-play game on the boards of size $\omega\times n$ and $n\times\omega$.

Theorem. In the transfinite-play version of Connect-$\omega$ on boards of size $\omega\times n$ or $n\times\omega$, where $n$ is finite, neither player can have a winning strategy. Indeed, both players can force a draw while also filling the board in $\omega$ moves.

Proof. It is clear that on the $n\times\omega$ board, either player can force each column to have infinitely many coins of their color, and this fills the board, while also preventing a win for the opponent, as desired.

On the $\omega\times n$ board, consider a variation of the strategy I described above. I shall simply always play in the first available empty column, thereby placing my coin on the bottom row, until the opponent also plays in a fresh column. At that moment, I shall play atop his coin, thereby placing another coin in the second row; immediately after this, I also play subsequently in the left-most available column (so as to force the board to be filled). I then continue playing in the bottom row, until the opponent also does, which she must, and then I can add another coin to the second row and so on. By always playing the first-available second-row slot with all-empty second rows to the right, I can ensure that the opponent will eventually also make a second-row play (since otherwise I will have a winning condition on the second row), and at such a moment, I can also make a third-row play. By continuing in this way, I am able to place infinitely many coins on each row, while also forcing that the board becomes filled. $\Box$

Unfortunately, the transfinite-play game seems to break the strategy-stealing arguments, since the play is not symmetric for the players, as the first player plays first at limit stages.

Nevertheless, following some ideas of Timothy Gowers in the comments below, let me show that the second player has a drawing strategy.

Theorem. In the transfinite-play version of Connect-$\omega$ on a board of size $\omega\times\omega$, the second player has a drawing strategy.

Proof. We shall arrange that the second player will block all possible winning configurations for the first player, or to have column wins for each player. To block all row wins, the second player will arrange to occupy infinitely many cells in each row; to block all diagonal wins, the second player will aim to occupy infinitely many cells on each possible diagonal; and to block the column wins, the second player will aim either to have infinitely many cells on each column or to copy a winning column of the opponent on another column.

To achieve these things, we simply play as follows. Take the columns in successive groups of three. On the first column in each block of three, that is on the columns indexed $3m$, the second player will always answer a move by the first player on this column. In this way, the second player occupies every other cell on these columns—all at the same height. This will block all diagonal wins, because every diagonal winning configuration will need to go through such a cell.

On the remaining two columns in each group of three, columns $3m+1$ and $3m+2$, let the second player simply copy moves of the opponent on one of these columns by playing on the other. These moves will therefore be opposite colors, but at the same height. In this way, the second player ensures that he has infinitely many coins on each row, blocking the row wins. And also, this strategy ensures that in these two columns, at any limit stage, either neither player has achieved a winning configuration or both have.

Thus, we have produced a drawing strategy for the second player. $\Box$

Thus, there is no advantage to going first. What remains is to determine if the first player also has a drawing strategy, or whether the second player can actually force a win.

Gowers explains in the comments below also how to achieve such a copying mechanism to work on a diagonal, instead of just on a column.

I find it also fascinating to consider the natural generalizations of the game to larger ordinals. We may consider the game of Connect-$\alpha$ on a board of size $\kappa\times\lambda$, for any ordinals $\alpha,\kappa,\lambda$, with transfinite play, proceeding until the board is filled or the winning conditions are achieved. Clearly, there are some instances of this game where a player has a winning strategy, such as the original Connect-Four configuration, where the first player wins, and presumably many other instances.

Question. In which instances of Connect-$\alpha$ on a board of size $\kappa\times\lambda$ does one of the players have a winning strategy?

It seems to me that the groups-of-three-columns strategy described above generalizes to show that the second player has at least a drawing strategy in Connect-$\alpha$ on board $\kappa\times\lambda$, whenever $\alpha$ is infinite.

Stay tuned…

Alan Turing, On computable numbers

I have been reading Alan Turing’s paper, On computable numbers, with an application to the entsheidungsproblem, an amazing classic, written by Turing while he was a student in Cambridge. This is the paper in which Turing introduces and defines his Turing machine concept, deriving it from a philosophical analysis of what it is that a human computer is doing when carrying out a computational task.

The paper is an incredible achievement. He accomplishes so much: he defines and explains the machines; he proves that there is a universal Turing machine; he shows that there can be no computable procedure for determining the validities of a sufficiently powerful formal proof system; he shows that the halting problem is not computably decidable; he argues that his machine concept captures our intuitive notion of computability; and he develops the theory of computable real numbers.

What I was extremely surprised to find, however, and what I want to tell you about today, is that despite the title of the article, Turing adopts an incorrect approach to the theory of computable numbers. His central definition is what is now usually regarded as a mistaken way to proceed with this concept.

Let me explain. Turing defines that a computable real number is one whose decimal (or binary) expansion can be enumerated by a finite procedure, by what we now call a Turing machine. You can see this in the very first sentence of his paper, and he elaborates on and confirms this definition in detail later on in the paper.

He subsequently develops the theory of computable functions of computable real numbers, where one considers computable functions defined on these computable numbers. The computable functions are defined not on the reals themselves, however, but on the programs that enumerate the digits of those reals. Thus, for the role they play in Turing’s theory, a computable real number is not actually regarded as a real number as such, but as a program for enumerating the digits of a real number. In other words, to have a computable real number in Turing’s theory is to have a program for enumerating the digits of a real number. And it is this aspect of Turing’s conception of computable real numbers where his approach becomes problematic.

One specific problem with Turing’s approach is that on this account, it turns out that the operations of addition and multiplication for computable real numbers are not computable operations. Of course this is not what we want.

The basic mathematical fact in play is that the digits of a sum of two real numbers $a+b$ is not a continuous function of the digits of $a$ and $b$ separately; in some cases, one cannot say with certainty the initial digits of $a+b$, knowing only finitely many digits, as many as desired, of $a$ and $b$.

To see this, consider the following sum $a+b$
$$\begin{align*}
&0.343434343434\cdots \\
+\quad &0.656565656565\cdots \\[-7pt]
&\hskip-.5cm\rule{2in}{.4pt}\\
&0.999999999999\cdots
\end{align*}$$
If you add up the numbers digit-wise, you get $9$ in every place. That much is fine, and of course we should accept either $0.9999\cdots$ or $1.0000\cdots$ as correct answers for $a+b$ in this instance, since those are both legitimate decimal representations of the number $1$.

The problem, I claim, is that we cannot assign the digits of $a+b$ in a way that will depend only on finitely many digits each of $a$ and $b$. The basic problem is that if we inspect only finitely many digits of $a$ and $b$, then we cannot be sure whether that pattern will continue, whether there will eventually be a carry or not, and depending on how the digits proceed, the initial digits of $a+b$ can be affected.

In detail, suppose that we have committed to the idea that the initial digits of $a+b$ are $0.999$, on the basis of sufficiently many digits of $a$ and $b$. Let $a’$ and $b’$ be numbers that agree with $a$ and $b$ on those finite parts of $a$ and $b$, but afterwards have all $7$s. In this case, the sum $a’+b’$ will involve a carry, which will turn all the nines up to that point to $0$, with a leading $1$, making $a’+b’$ strictly great than $1$ and having decimal representation $1.000\cdots00005555\cdots$. Thus, the initial-digits answer $0.999$ would be wrong for $a’+b’$, even though $a’$ and $b’$ agreed with $a$ and $b$ on the sufficiently many digits supposedly justifying the $0.999$ answer. On the other hand, if we had committed ourselves to $1.000$ for $a+b$, on the basis of finite parts of $a$ and $b$ separately, then let $a”$ and $b”$ be all $2$s beyond that finite part, in which case $a”+b”$ is definitely less than $1$, making $1.000$ wrong.

Therefore, there is no algorithm to compute the digits of $a+b$ continuously from the digits of $a$ and $b$ separately. It follows that there can be no computable algorithm for computing the digits of $a+b$, given the programs that compute $a$ and $b$ separately, which is how Turing defines computable functions on the computable reals. (This consequence is a subtly different and stronger claim, but one can prove it using the Kleene recursion theorem. Namely, let $a=.343434\cdots$ and then consider the program to enumerate a number $b$, which will begin with $0.656565$ and keep repeating $65$ until it sees that the addition program has given the initial digits for $a+b$, and at this moment our program for $b$ will either switch to all $7$s or all $2$s in such a way so as to refute the result. The Kleene recursion theorem is used in order to know that indeed there is such a self-referential program enumerating $b$.)

One can make similar examples showing that multiplication and many other very simple functions are not computable, if one insists that a computable number is an algorithm enumerating the digits of the number.

So what is the right definition of computable number? Turing was right that in working with computable real numbers, we want to be working with the programs that compute them, rather than the reals themselves somehow. What is needed is a better way of saying that a given program computes a given real.

The right definition, widely used today, is that we want an algorithm not to compute exactly the digits of the number, but rather, to compute approximations to the number, as close as desired, with a known degree of accuracy. One can define a computable real number as a computable sequence of rational numbers, such that the $n^{th}$ number is within $1/2^n$ of the target number. This is equivalent to being able to compute rational intervals around the target real, of size less than any specified accuracy. And there are many other equivalent ways to do it. With this concept of computable real number, then the operations of addition, multiplication, and so on, all the familiar operations on the real numbers, will be computable.

But let me clear up a confusing point. Although I have claimed that Turing’s original definition of computable real number is incorrect, and I have explained how we usually define this concept today, the mathematical fact is that a real number $x$ has a computable presentation in Turing’s sense (we can compute the digits of $x$) if and only if it has a computable presentation in the contemporary sense (we can compute rational approximations to any specified accuracy). Thus, in terms of which real numbers we are talking about, the two approaches are extensionally the same.

Let me quickly prove this. If a real number $x$ is computable in Turing’s sense, so that we can compute the digits of $x$, then we can obviously compute rational approximations to any desired accuracy, simply by taking sufficiently many digits. And conversely, if a real number $x$ is computable in the contemporary sense, so we can compute rational approximations to any specified accuracy, then either it is itself a rational number, in which case we can certainly compute the digits of $x$, or else it is irrational, in which case for any specified digit place, we can wait until we have a rational approximation forcing it to one side or the other, and thereby come to know this digit. (Note: there are issues of intuitionistic logic occurring here, precisely because we cannot tell from the approximation algorithm itself which case we are in.) Note also that this argument works in any desired base.

So there is something of a philosophical problem here. The issue isn’t that Turing has misidentified particular reals as being computable or non-computable or has somehow got the computable reals wrong extensionally as a subset of the real numbers, since every particular real number has Turing’s kind of representation if and only if it has the approximation kind of representation. Rather, the problem is that because we want to deal with computable real numbers by working with the programs that represent them, Turing’s approach means that we cannot regard addition as a computable function on the computable reals. There is no computable procedure that when given two programs for enumerating the digits of real numbers $a$ and $b$ returns a program for enumerating the digits of the sum $a+b$. But if you use the contemporary rational-approximation representation of computable real number, then you can computably produce a program for the sum, given programs for the input reals. This is the sense in which Turing’s approach is wrong.