The propagation of error in classical geometry constructions

I’d like to discuss the issue of error and error propagation in the constructions of classical geometry. How does error propagate in these constructions? How sensitive are the familiar classical constructions to small errors in the use of the straightedge or compass?

Let me illustrate what I have in mind by considering the classical construction of Apollonius of the perpendicular bisector of a line segment $AB$.  One forms two circles, centered at $A$ and $B$ respectively, each with radius $AB$. These arcs intersect at points $P$ and $Q$, respectively, which form the perpendicular bisector, meeting the original segment at the midpoint $C$.

That is all fine and good, and one can easily prove that indeed $PQ$ is perpendicular to $AB$ and that $C$ is the midpoint of $AB$, as desired.

When carrying out such a construction in practice, however, there will inevitably be some small errors. We do not expect to implement it exactly, with infinite precision, but rather, we expect some small errors in the placement of the compass or straightedge, and perhaps these errors may accumulate and they propagate through the construction. What I would like to discuss is the sensitivity of this construction and the other classical constructions to these small errors.

For example, suppose that we are given points $A$ and $B$. When we seek to construct the circle centered at $A$ with radius $AB$, we place the point of the compass at $A$, and this placement may have some small error deviating from $A$, landing somewhere in the blue circle. Similarly, the writing (or etching) implement end of the compass is placed at $B$, with its own possibly different error, landing the orange circle at $B$. The arc actually resulting will be some arc arising with some such small errors, like this:

We may represent the space of all arcs that could arise in conformance with those error bounds as the blurred orange arc below. This image was created simply by drawing many dozens of such arcs in orange, with various choices for the center and radius within the error circles, and blending the results together.

We carry out the same construction with similar errors for the other arc, centered at $B$ and passing through $A$. These arcs overlap in the darker orange regions above and below, determining the points $P$ and $Q$.

The actual arcs we draw and the corresponding vertical will land somewhere inside these blurred regions, perhaps like this:

Note that in this particular case, the resulting line $PQ$ is noticably non-perpendicular to $AB$, and the resulting point $C$ is noticably not the midpoint. Consider the space of all the bisectors $PQ$ that might arise in conformance with our errors on $A$ and $B$, showing the result as the vertical red shaded region.   The darker red region is the space of possible points $C$ that we might have constructed as the “midpoint” $C$, in conformance with the error estimates.

Given the size of the original error bounds on the points $A$ and $B$, it may be surprising that even such a standard simple construction as this — constructing the perpendicular bisector and midpoint of a segment — appears to have comparatively large error propagation, since the shaded red region $C$ is quite large and includes many points that one would not say are close to being the midpoint.  In this sense, the Apollonius perpendicular bisector construction appears to be sensitive to the errors of compass placement.

Is there a better construction? For example, in terms of improving the accuracy at least of the perpendicularity of $PQ$ to $AB$, it would seem to help to use a much larger circle, which would lower the variation in the resulting “right” angle.  But this is partly because we have so far assumed that compass error arises only with the placement of the points of the compass, and not during the course of actually drawing the arc. But of course, one can imagine that errors arise from a flexing of the compass during use, causing it to deviate from circularity, or from slippage, which might reasonably be expected to cause increasing error with the length or degree of the arc, and so on, and such a model of error might have greater errors with large circles.

One could in principle carry out similar analyses for any geometric construction, and use the corresponding results to compare the sensitivity of various methods for constructing the same object, as well as modeling different sources of error.  The goal might be to mount a precise analysis of all the standard constructions and compare competing constructions for accuracy.

There is a literature of papers doing precisely this, and I will try to post some references later (or please do so in the comments, if you have some good ones).

Another approach to error estimation would be to think of the errors at points $A$ and $B$ as probability distributions, centered at $A$ and $B$ and with a certain variation; and one then gets corresponding distributions for the points $P$ and $Q$, which are not rigid shapes as in my diagrams, but qualitatively similar distributions spread out in that region, and a resulting probability distribution for the point $C$.

Finally, let me mention that one might hope to improve the accuracy of a construction, simply by repeating it and averaging the result, or by some other convergence algorithm. For example, as a first step, we might simply perform the Apollonius bisector construction twice, producing midpoint candidates $C_0$ and $C_1$, and we could proceed simply find the midpoint of $C_0C_1$ as a further presumably more accurate midpoint. Or we could iterate in some other manner and hope to converge to the actual midpoint. For example, we could produce seven midpoint candidates, and take the resulting median point.

Draw an infinite chessboard in perspective, using straightedge only

I’d like to explain to you how to draw chessboards by hand in perfect perspective, using only a straightedge.  In this post, I’ll explain how to construct chessboards of any size, starting with the size of the basic unit square.

This post follows up on the post I made yesterday about how to draw a chessboard in perspective view, using only a straightedge.  That method was a subdivision method, where one starts with the boundary of the desired board, and then subdivides to make a chessboard. Now, we start with the basic square and build up. This method is actually quite efficient for quickly making very large boards in perspective view.

I want to emphasize that this is something that you can actually do, right now. It’s fun! All you need is a piece of paper, a pencil and a straightedge. I’ll wait right here while you gather your materials. Use a ruler or a chop stick (as I did) or the edge of a notebook or the lid of a box. Sit at your table and draw a huge chessboard in perspective. You can totally do this.

Start with a horizon, having two points at infinity (orange), at left and right, and a third point midway between them (brown), which we will call the diagonal infinity. Also, mark the front corner of your chess board (blue).

Extend the front corner to the points at infinity. And then mark off (red) a point that will be a measure of the grid spacing in the chessboard. This will the be size of the front square.

You can extend that point to infinity at the right. This delimits the first rank of the chessboard.

Next, extend the front corner of the board to the diagonal infinity.

The intersection of that diagonal with the previous line determines a point, which when extended to infinity at the left, produces the first square of the chessboard.

And that line determines a new point on the leading rank edge. Extend that point up to the diagonal infinity, which determines another point on the second rank line.


Extend that line to infinity at the left, which determines another point on the leading rank edge.

Continuing in this way, one can produce as many first rank squares as desired. Go ahead and do that. At each step, you extend up to the diagonal infinity, which determines a new point, which when extended to infinity at the left determines another point, and so on.

If you should now reflect on the current diagram, you may notice that we have actually determined many further points in the grid than we have mentioned — and thanks to my daughter Hypatia for noticing this simplification — for there is a whole triangle of further intersection points between the files and the diagonals.

We can use these points (and we do not need them all) to construct the rest of the board, by drawing out the lines to infinity at the right. Thus, we construct the whole chessboard:

One can construct a perspective chessboard of any size this way, and one can simply continue with the construction and make it larger, if desired.

It will look a little better if you add a point at infinity down below (and do so directly below the diagonal point at infinity, but a good distance down below the board), and extend the board downward one level. The corresponding diagram on yesterday’s post might be helpful.


You can now color the tile pattern, and you’ll have a chessboard in perfect perspective view.


If you keep going, you can make extremely large chessboards. In time, I hope that you will come to learn how to complete an infinite chess board in finite time.


Draw a chessboard in perspective view, using straightedge only

Let me show you how to hand-draw an image of a chessboard in perspective, using only a straightedge. There is no need to measure any distances or to make calculations of any kind.  All you need is a straightedge, paper and your pen or pencil.

Imagine the vast doodling potential!  A person who happens to be stuck in a lecture on some other less interesting topic could easily make one or more elaborate chessboards in perfect perspective, using only a straightedge! Please carry out the construction and then share your resulting images. I would love to see them.

To begin, give yourself a horizon at the top of the page, with two points at infinity (in orange), and also two points (blue) that will become the front and back corner of your chessboard.  You can play around with different arrangements of these points, which will lead ultimately to different perspective views.

Using your straightedge, draw the lines from those reference points to the points at infinity. The resulting enclosed region will ultimately become the main chess board. One can alternatively think of starting with the front edges of the desired board, and then setting a horizon and using that to determine the points at infinity, and then adding the back edges. If you like, you can add a third point at infinity down below, and a front bottom reference point (blue), to give the board some thickness. Construct the lines to the bottom point at infinity.

The result is now the main outline of your chessboard as a rectangular solid.

Next, construct the center point of the top face, by drawing the two diagonals and finding the intersection point. I call this the center point, because it is the point that represents in the perspective view the actual center point of the chessboard, even though this point is not in the “center” of the quadrilateral representing the board. It is remarkable that one can find that point without needing to measure or calculate anything, simply because the two straight lines of the diagonal of the chessboard intersect at that point, and this remains true in the perspective view. This is the key idea that enables the entire construction method to proceed with only a straightedge.

Construct the midpoint lines by drawing the lines from that centerpoint to the points at infinity.

Now you have the chessboard with the main midlines drawn.

Construct the center points of the two diagonal squares, by intersecting the diagonals of each of them.

Construct the lines that extend those points to the points at infinity.

Thus, you have constructed the main 4 x 4 grid.

You can extend the grid lines down to the bottom point at infinity like so:

One can stop with the 4 x 4 board, if desired. Simply add suitable shading, and you’ve completed the 4 x 4 chessboard in perspective.

Alternatively, one can continue with one more iteration to construct the 8 x 8 board. From the 4 x 4 grid (with no shading yet) simply construct the center points of the squares on the main diagonal, and extend those lines to infinity.  This will enable you to draw the 8 x 8 grid lines, and after shading, you’ll have the complete chessboard.

The initial arrangement of points affects the nature of the perspective view. Having the points at infinity very far away will produce something closer to orthoprojection; having them close produces a more extreme perspective, which simulates a view from a vantage point extremely close to the board.

Please give the construction a try! All you need is paper, pencil and a straightedge! Provide links below in the comments to photos of your creations.

Meanwhile, let me point you towards my follow post, How to draw infinite chessboards by hand in perfect perspective, using only a straightedge. The difference between the methods is that the method of this post is about subdividing a given board, and the other method is about generating arbitrarily large chessboards from a given unit square.

I learned these construction method years ago from my CUNY geometer colleague Ilya Kofman.

The axiom of well-ordered replacement is equivalent to full replacement over Zermelo + foundation

In recent work, Alfredo Roque Freire and I have realized that the axiom of well-ordered replacement is equivalent to the full replacement axiom, over the Zermelo set theory with foundation.

The well-ordered replacement axiom is the scheme asserting that if $I$ is well-ordered and every $i\in I$ has unique $y_i$ satisfying a property $\phi(i,y_i)$, then $\{y_i\mid i\in I\}$ is a set. In other words, the image of a well-ordered set under a first-order definable class function is a set.

Alfredo had introduced the theory Zermelo + foundation + well-ordered replacement, because he had noticed that it was this fragment of ZF that sufficed for an argument we were mounting in a joint project on bi-interpretation. At first, I had found the well-ordered replacement theory a bit awkward, because one can only apply the replacement axiom with well-orderable sets, and without the axiom of choice, it seemed that there were not enough of these to make ordinary set-theoretic arguments possible.

But now we know that in fact, the theory is equivalent to ZF.

Theorem. The axiom of well-ordered replacement is equivalent to full replacement over Zermelo set theory with foundation.

$$\text{ZF}\qquad = \qquad\text{Z} + \text{foundation} + \text{well-ordered replacement}$$

Proof. Assume Zermelo set theory with foundation and well-ordered replacement.

Well-ordered replacement is sufficient to prove that transfinite recursion along any well-order works as expected. One proves that every initial segment of the order admits a unique partial solution of the recursion up to that length, using well-ordered replacement to put them together at limits and overall.

Applying this, it follows that every set has a transitive closure, by iteratively defining $\cup^n x$ and taking the union. And once one has transitive closures, it follows that the foundation axiom can be taken either as the axiom of regularity or as the $\in$-induction scheme, since for any property $\phi$, if there is a set $x$ with $\neg\phi(x)$, then let $A$ be the set of elements $a$ in the transitive closure of $\{x\}$ with $\neg\phi(a)$; an $\in$-minimal element of $A$ is a set $a$ with $\neg\phi(a)$, but $\phi(b)$ for all $b\in a$.

Another application of transfinite recursion shows that the $V_\alpha$ hierarchy exists. Further, we claim that every set $x$ appears in the $V_\alpha$ hierarchy. This is not immediate and requires careful proof. We shall argue by $\in$-induction using foundation. Assume that every element $y\in x$ appears in some $V_\alpha$. Let $\alpha_y$ be least with $y\in V_{\alpha_y}$. The problem is that if $x$ is not well-orderable, we cannot seem to collect these various $\alpha_y$ into a set. Perhaps they are unbounded in the ordinals? No, they are not, by the following argument. Define an equivalence relation $y\sim y’$ iff $\alpha_y=\alpha_{y’}$. It follows that the quotient $x/\sim$ is well-orderable, and thus we can apply well-ordered replacement in order to know that $\{\alpha_y\mid y\in x\}$ exists as a set. The union of this set is an ordinal $\alpha$ with $x\subseteq V_\alpha$ and so $x\in V_{\alpha+1}$. So by $\in$-induction, every set appears in some $V_\alpha$.

The argument establishes the principle: for any set $x$ and any definable class function $F:x\to\text{Ord}$, the image $F\mathrel{\text{”}}x$ is a set. One proves this by defining an equivalence relation $y\sim y’\leftrightarrow F(y)=F(y’)$ and observing that $x/\sim$ is well-orderable.

We can now establish the collection axiom, using a similar idea. Suppose that $x$ is a set and every $y\in x$ has a witness $z$ with $\phi(y,z)$. Every such $z$ appears in some $V_\alpha$, and so we can map each $y\in x$ to the smallest $\alpha_y$ such that there is some $z\in V_{\alpha_y}$ with $\phi(y,z)$. By the observation of the previous paragraph, the set of $\alpha_y$ exists and so there is an ordinal $\alpha$ larger than all of them, and thus $V_\alpha$ serves as a collecting set for $x$ and $\phi$, verifying this instance of collection.

From collection and separation, we can deduce the replacement axiom $\Box$

I’ve realized that this allows me to improve an argument I had made some time ago, concerning Transfinite recursion as a fundamental principle. In that argument, I had proved that ZC + foundation + transfinite recursion is equivalent to ZFC, essentially by showing that the principle of transfinite recursion implies replacement for well-ordered sets. The new realization here is that we do not need the axiom of choice in that argument, since transfinite recursion implies well-ordered replacement, which gives us full replacement by the argument above.

Corollary. The principle of transfinite recursion is equivalent to the replacement axiom over Zermelo set theory with foundation.

$$\text{ZF}\qquad = \qquad\text{Z} + \text{foundation} + \text{transfinite recursion}$$

There is no need for the axiom of choice.

A question in set-theoretic geology: if $M[G][K]=M[H][K],$ then can we conlude $M[G]=M[H]$?

I was recently asked this interesting question on set-theoretic geology by Iian Smythe, a set-theory post-doc at Rutgers University; the problem arose in the context of one of this current research projects.

Question. Assume that two product forcing extensions are the same $$M[G][K]=M[H][K],$$ where $M[G]$ and $M[H]$ are forcing extensions of $M$ by the same forcing notion $\mathbb{P}$, and $K\subset\mathbb{Q}\in M$ is both $M[G]$ and $M[H]$-generic with respect to this further forcing $\mathbb{Q}$.  Can we conclude that $$M[G]=M[H]\ ?$$ Can we make this conclusion at least in the special case that $\mathbb{P}$ is adding a Cohen real and $\mathbb{Q}$ is collapsing the continuum?

It seems natural to hope for a positive answer, because we are aware of many such situations that arise with forcing, where indeed $M[G]=M[H]$. Nevertheless, the answer is negative. Indeed, we cannot legitimately make this conclusion even when both steps of forcing are adding merely a Cohen real. And such a counterexample implies that there is a counterexample of the type mentioned in the question, simply by performing further collapse forcing.

Theorem. For any countable model $M$ of set theory, there are $M$-generic Cohen reals $c$, $d$ and $e$, such that

  1. The Cohen reals $c$ and $e$ are mutually generic over $M$.
  2. The Cohen reals $d$ and $e$ are mutually generic over $M$.
  3. These two pairs produce the same forcing extension $M[c][e]=M[d][e]$.
  4. But  the intermediate models are different $M[c]\neq M[d]$.

Proof. Fix $M$, and let $c$ and $e$ be any two mutually generic Cohen reals over $M$. Let us view them as infinite binary sequences, that is, as elements of Cantor space. In the extension $M[c][e]$, let $d=c+e \mod 2$, in each coordinate. That is, we get $d$ from $c$ by flipping bits, but only on coordinates that are $1$ in $e$. This is the same as applying a bit-flipping automorphism of the forcing, which is available in $M[e]$, but not in $M$. Since $c$ is $M[e]$-generic by reversing the order of forcing, it follows that $d$ also is $M[e]$-generic, since the automorphism is in $M[e]$. Thus, $d$ and $e$ are mutually generic over $M$. Further, $M[c][e]=M[d][e]$, because $M[e][c]=M[e][d]$, as $c$ and $d$ were isomorphic generic filters by an isomorphism in $M[e]$. But finally, $M[c]$ and $M[d]$ are not the same, because from $c$ and $d$ together we can construct $e$, because we can tell exactly which bits were flipped. $\Box$

If one now follows the $e$ forcing with collapse forcing, one achieves a counterexample model of the type mentioned in the question, namely, with $M[c][e*K]=M[d][e*K]$, but $M[c]\neq M[d]$.

I have a feeling that my co-authors on a current paper in progress, Set-theoretic blockchains, on the topic of non-amalgamation in the generic multiverse, will tell me that the argument above is an instance of some of the theorems we prove in the latter part of that paper. (Miha, please tell me in the comments, if you see this, or tell me where I have seen this argument before; I think I made this argument or perhaps seen it before.) The paper is

  • [DOI] M. E. Habič, J. D. Hamkins, L. D. Klausner, J. Verner, and K. J. Williams, “Set-theoretic blockchains,” Archive for Mathematical Logic, 2019.
    author = {Miha E. Habič and Joel David Hamkins and Lukas Daniel Klausner and Jonathan Verner and Kameryn J. Williams},
    title = {Set-theoretic blockchains},
    journal="Archive for Mathematical Logic",
    abstract="Given a countable model of set theory, we study the structure of its generic multiverse, the collection of its forcing extensions and ground models, ordered by inclusion. Mostowski showed that any finite poset embeds into the generic multiverse while preserving the nonexistence of upper bounds. We obtain several improvements of his result, using what we call the blockchain construction to build generic objects with varying degrees of mutual genericity. The method accommodates certain infinite posets, and we can realize these embeddings via a wide variety of forcing notions, while providing control over lower bounds as well. We also give a generalization to class forcing in the context of second-order set theory, and exhibit some further structure in the generic multiverse, such as the existence of exact pairs.",
    note = {},
    abstract = {},
    eprint = {1808.01509},
    archivePrefix = {arXiv},
    primaryClass = {math.LO},
    keywords = {},
    source = {},
    url = {},


A new proof of the Barwise extension theorem, without infinitary logic

I have found a new proof of the Barwise extension theorem, that wonderful yet quirky result of classical admissible set theory, which says that every countable model of set theory can be extended to a model of $\text{ZFC}+V=L$.

Barwise Extension Theorem. (Barwise 1971) $\newcommand\ZF{\text{ZF}}\newcommand\ZFC{\text{ZFC}}$ Every countable model of set theory $M\models\ZF$ has an end-extension to a model of $\ZFC+V=L$.

The Barwise extension theorem is both (i) a technical culmination of the pioneering methods of Barwise in admissible set theory and infinitary logic, including the Barwise compactness and completeness theorems and the admissible cover, but also (ii) one of those rare mathematical theorems that is saturated with significance for the philosophy of mathematics and particularly the philosophy of set theory. I discussed the theorem and its philosophical significance at length in my paper, The multiverse perspective on the axiom of constructibility, where I argued that it can change how we look upon the axiom of constructibility and whether this axiom should be considered ‘restrictive,’ as it often is in set theory. Ultimately, the Barwise extension theorem shows how wrong a model of set theory can be, if we should entertain the idea that the set-theoretic universe continues growing beyond it.

Regarding my new proof, below, however, what I find especially interesting about it, if not surprising in light of (i) above, is that it makes no use of Barwise compactness or completeness and indeed, no use of infinitary logic at all! Instead, the new proof uses only classical methods of descriptive set theory concerning the representation of $\Pi^1_1$ sets with well-founded trees, the Levy and Shoenfield absoluteness theorems, the reflection theorem and the Keisler-Morley theorem on elementary extensions via definable ultrapowers. Like the Barwise proof, my proof splits into cases depending on whether the model $M$ is standard or nonstandard, but another interesting thing about it is that with my proof, it is the $\omega$-nonstandard case that is easier, whereas with the Barwise proof, the transitive case was easiest, since one only needed to resort to the admissible cover when $M$ was ill-founded. Barwise splits into cases on well-founded/ill-founded, whereas in my argument, the cases are $\omega$-standard/$\omega$-nonstandard.

To clarify the terms, an end-extension of a model of set theory $\langle M,\in^M\rangle$ is another model $\langle N,\in^N\rangle$, such that the first is a substructure of the second, so that $M\subseteq N$ and $\in^M=\in^N\upharpoonright M$, but further, the new model does not add new elements to sets in $M$. In other words, $M$ is an $\in$-initial segment of $N$, or more precisely: if $a\in^N b\in M$, then $a\in M$ and hence $a\in^M b$.

Set theory, of course, overflows with instances of end-extensions. For example, the rank-initial segments $V_\alpha$ end-extend to their higher instances $V_\beta$, when $\alpha<\beta$; similarly, the hierarchy of the constructible universe $L_\alpha\subseteq L_\beta$ are end-extensions; indeed any transitive set end-extends to all its supersets. The set-theoretic universe $V$ is an end-extension of the constructible universe $L$ and every forcing extension $M[G]$ is an end-extension of its ground model $M$, even when nonstandard. (In particular, one should not confuse end-extensions with rank-extensions, also known as top-extensions, where one insists that all the new sets have higher rank than any ordinal in the smaller model.)

Let’s get into the proof.

Proof. Suppose that $M$ is a model of $\ZF$ set theory. Consider first the case that $M$ is $\omega$-nonstandard. For any particular standard natural number $k$, the reflection theorem ensures that there are arbitrarily high $L_\alpha^M$ satisfying $\ZFC_k+V=L$, where $\ZFC_k$ refers to the first $k$ axioms of $\ZFC$ in a fixed computable enumeration by length. In particular, every countable transitive set $m\in L^M$ has an end-extension to a model of $\ZFC_k+V=L$. By overspill (that is, since the standard cut is not definable), there must be some nonstandard $k$ for which $L^M$ thinks that every countable transitive set $m$ has an end-extension to a model of $\ZFC_k+V=L$, which we may assume is countable. This is a $\Pi^1_2$ statement about $k$, which will therefore also be true in $M$, by the Shoenfield absolutenss theorem. It will also be true in all the elementary extensions of $M$, as well as in their forcing extensions. And indeed, by the Keisler-Morley theorem, the model $M$ has an elementary top extension $M^+$. Let $\theta$ be a new ordinal on top of $M$, and let $m=V_\theta^{M^+}$ be the $\theta$-rank-initial segment of $M^+$, which is a top-extension of $M$. Let $M^+[G]$ be a forcing extension in which $m$ has become countable. Since the $\Pi^1_2$ statement is true in $M^+[G]$, there is an end-extension of $\langle m,\in^{M^+}\rangle$ to a model $\langle N,\in^N\rangle$ that $M^+[G]$ thinks satisfies $\ZFC_k+V=L$. Since $k$ is nonstandard, this theory includes all the $\ZFC$ axioms, and since $m$ end-extends $M$, we have found an end-extension of $M$ to a model of $\ZFC+V=L$, as desired.

It remains to consider the case where $M$ is $\omega$-standard. By the Keisler-Morley theorem, let $M^+$ be an elementary top-extension of $M$. Let $\theta$ be an ordinal of $M^+$ above $M$, and consider the corresponding rank-initial segment $m=V_\theta^{M^+}$, which is a transitive set in $M^+$ that covers $M$. If $\langle m,\in^{M^+}\rangle$ has an end-extension to a model of $\ZFC+V=L$, then we’re done, since such a model would also end-extend $M$. So assume toward contradiction that there is no such end-extension of $m$. Let $M^+[G]$ be a forcing extension in which $m$ has become countable. The assertion that $m$ has no end-extension to a model of $\ZFC+V=L$ is actually true and hence true in $M^+[G]$. This is a $\Pi^1_1$ assertion there about the real coding $m$. Every such assertion has a canonically associated tree, which is well-founded exactly when the statement is true. Since the statement is true in $M^+[G]$, this tree has some countable rank $\lambda$ there. Since these models have the standard $\omega$, the tree associated with the statement is the same for us as inside the model, and since the statement is actually true, the tree is actually well founded. So the rank $\lambda$ must come from the well-founded part of the model.

If $\lambda$ happens to be countable in $L^{M^+}$, then consider the assertion, “there is a countable transitive set, such that the assertion that it has no end-extension to a model of $\ZFC+V=L$ has rank $\lambda$.” This is a $\Sigma_1$ assertion, since it is witnessed by the countable transitive set and the ranking function of the tree associated with the non-extension assertion. Since the parameters are countable, it follows by Levy reflection that the statement is true in $L^{M^+}$. So $L^{M^+}$ has a countable transitive set, such that the assertion that it has no end-extension to a model of $\ZFC+V=L$ has rank $\lambda$. But since $\lambda$ is actually well-founded, the statement would have to be actually true; but it isn’t, since $L^{M^+}$ itself is such an extension, a contradiction.

So we may assume $\lambda$ is uncountable in $M^+$. In this case, since $\lambda$ was actually well-ordered, it follows that $L^M$ is well-founded beyond its $\omega_1$. Consider the statement “there is a countable transitive set having no end-extension to a model of $\ZFC+V=L$.” This is a $\Sigma^1_2$ sentence, which is true in $M^+[G]$ by our assumption about $m$, and so by Shoenfield absoluteness, it is true in $L^{M^+}$ and hence also $L^M$. So $L^M$ thinks there is a countable transitive set $b$ having no end-extension to a model of $\ZFC+V=L$. This is a $\Pi^1_1$ assertion about $b$, whose truth is witnessed in $L^M$ by a ranking of the associated tree. Since this rank would be countable in $L^M$ and this model is well-founded up to its $\omega_1$, the tree must be actually well-founded. But this is impossible, since it is not actually true that $b$ has no such end-extension, since $L^M$ itself is such an end-extension of $b$. Contradiction. $\Box$

One can prove a somewhat stronger version of the theorem, as follows.

Theorem. For any countable model $M$ of $\ZF$, with an inner model $W\models\ZFC$, and any statement $\phi$ true in $W$, there is an end-extension of $M$ to a model of $\ZFC+\phi$. Furthermore, one can arrange that every set of $M$ is countable in the extension model.

In particular, one can find end-extensions of $\ZFC+V=L+\phi$, for any statement $\phi$ true in $L^M$.

Proof. Carry out the same proof as above, except in all the statements, ask for end-extensions of $\ZFC+\phi$, instead of end-extensions of $\ZFC+V=L$, and also ask that the set in question become countable in that extension. The final contradictions are obtained by the fact that the countable transitive sets in $L^M$ do have end-extensions like that, in which they are countable, since $W$ is such an end-extension. $\Box$

For example, we can make the following further examples.


  1. Every countable model $M$ of $\ZFC$ with a measurable cardinal has an end-extension to a model $N$ of $\ZFC+V=L[\mu]$.
  2. Every countable model $M$ of $\ZFC$ with extender-based large cardinals has an end-extension to a model $N$ satisfying $\ZFC+V=L[\vec E]$.
  3. Every countable model $M$ of $\ZFC$ with infinitely many Woodin cardinals has an end-extension to a model $N$ of $\ZF+\text{AD}+V=L(\mathbb{R})$.

And in each case, we can furthermore arrange that every set of $M$ is countable in the extension model $N$.

This proof grew out of a project on the $\Sigma_1$-definable universal finite set, which I am currently undertaking with Kameryn Williams and Philip Welch.

Jon Barwise. Infinitary methods in the model theory of set theory. In Logic
Colloquium ’69 (Proc. Summer School and Colloq., Manchester, 1969), pages
53–66. North-Holland, Amsterdam, 1971.

Infinite Sudoku and the Sudoku game

Consider what I call the Sudoku game, recently introduced in the MathOverflow question Who wins two-player Sudoku? posted by Christopher King (user PyRulez). Two players take turns placing numbers on a Sudoku board, obeying the rule that they must never explicitly violate the Sudoku condition: the numbers on any row, column or sub-board square must never repeat. The first player who cannot continue legal play loses. Who wins the game? What is the winning strategy?

The game is not about building a global Sudoku solution, since a move can be legal in this game even when it is not part of any global Sudoku solution, provided only that it doesn’t yet explicitly violate the Sudoku condition. Rather, the Sudoku game is about trying to trap your opponent in a maximal such position, a position which does not yet explicitly violate the Sudoku condition but which cannot be further extended.

In my answer to the question on MathOverflow, I followed an idea suggested to me by my daughter Hypatia, namely that on even-sized boards $n^2\times n^2$ where $n$ is even, then the second player can win with a mirroring strategy: simply copy the opponent’s moves in reflected mirror image through the center of the board. In this way, the second player ensures that the position on the board is always symmetric after her play, and so if the previous move was safe, then her move also will be safe by symmetry. This is therefore a winning strategy for the second player, since any violation of the Sudoku condition will arise on the opponent’s play.

This argument works on even-sized boards precisely because the reflection of every row, column and sub-board square is a totally different row, column and sub-board square, and so any new violation of the Sudoku conditions would reflect to a violation that was already there. The mirror strategy definitely does not work on the odd-sized boards, including the main $9\times 9$ case, since if the opponent plays on the central row, copying directly would immediately introduce a Sudoku violation.

After posting that answer, Orson Peters (user orlp) pointed out that one can modify it to form a winning strategy for the first player on odd-sized boards, including the main $9\times 9$ case. In this case, let the first player begin by playing $5$ in the center square, and then afterwards copy the opponent’s moves, but with the ten’s complement at the reflected location. So if the opponent plays $x$, then the first player plays $10-x$ at the reflected location. In this way, the first player can ensure that the board is ten’s complement symmetric after her moves. The point is that again this is sufficient to know that she will never introduce a violation, since if her $10-x$ appears twice in some row, column or sub-board square, then $x$ must have already appeared twice in the reflected row, column or sub-board square before that move.

This idea is fully general for odd-sized Sudoku boards $n^2\times n^2$, where $n$ is odd. If $n=2k-1$, then the first player starts with $k$ in the very center and afterward plays the $2k$-complement of her opponent’s move at the reflected location.


  1. On even-sized Sudoku boards, the second player wins the Sudoku game by the mirror copying strategy.
  2. On odd-sized Sudoku boards, the first players wins the Sudoku game by the complement-mirror copying strategy.

Note that on the even boards, the second player could also play complement mirror copying just as successfully.

What I really want to tell you about, however, is the infinite Sudoku game (following a suggestion of Sam Hopkins). Suppose that we try to play the Sudoku game on a board whose subboard squares are $\mathbb{Z}\times\mathbb{Z}$, so that the full board is a $\mathbb{Z}\times\mathbb{Z}$ array of those squares, making $\mathbb{Z}^2\times\mathbb{Z}^2$ altogether. (Or perhaps you might prefer the board $\mathbb{N}^2\times\mathbb{N}^2$?)

One thing to notice is that on an infinite board, it is no longer possible to get trapped at a finite stage of play, since every finite position can be extended simply by playing a totally new label from the set of labels; such a move would never lead to a new violation of the explicit Sudoku condition.

For this reason, I should like to introduce the Sudoku Solver-Spoiler game variation as follows. There are two players: the Sudoku Solver and the Sudoku Spoiler. The Solver is trying to build a global Sudoku solution on the board, while the Spoiler is trying to prevent this. Both players must obey the Sudoku condition that labels are never to be explicitly repeated in any row, column or sub-board square. On an infinite board, the game proceeds transfinitely, until the board is filled or there are no legal moves. The Solver wins a play of the game, if she successfully builds a global Sudoku solution, which means not only that every location has a label and there are no repetitions in any row, column or sub-board square, but also that every label in fact appears in every row, column and sub-board square. That is, to count as a solution, the labels on any row, column and sub-board square must be a bijection with the set of labels. (On infinite boards, this is a stronger requirement than merely insisting on no repetitions.)

The Solver-Spoiler game makes sense in complete generality on any set $S$, whether finite or infinite. The sub-boards are $S^2=S\times S$, and one has an $S\times S$ array of them, so $S^2\times S^2$ for the whole board. Every row and column has the same size as the sub-board square $S^2$, and the set of labels should also have this size.

Upon reflection, one realizes that what matters about $S$ is just its cardinality, and we really have for every cardinal $\kappa$ the $\kappa$-Sudoku Solver-Spoiler game, whose board is $\kappa^2\times\kappa^2$, a $\kappa\times\kappa$ array of $\kappa\times\kappa$ sub-boards. In particular, the game $\mathbb{Z}^2\times\mathbb{Z}^2$ is actually isomorphic to the game $\mathbb{N}^2\times\mathbb{N}^2$, despite what might feel initially like a very different board geometry.

What I claim is that the Solver has a winning strategy in the all the infinite Sudoku Solver-Spoiler games, in a very general and robust manner.

Theorem. For every infinite cardinal $\kappa$, the Solver has a winning strategy to win the $\kappa$-Sudoku Solver-Spoiler game.

  • The strategy will win in $\kappa$ many moves, producing a full Sudoku solution.
  • The Solver can win whether she goes first or second, starting from any legal position of size less than $\kappa$.
  • The Solver can win even when the Spoiler is allowed to play finitely many labels at once on each turn, or fewer than $\kappa$ many moves (if $\kappa$ is regular), even if the Solver is only allowed one move each turn.
  • In the countably infinite Sudoku game, the Solver can win even if the Spoiler is allowed to make infinitely many moves at once, provided only that the resulting position can in principle be extended to a full solution.

Proof. Consider first the countably infinite Sudoku game, and assume the initial position is finite and that the Spoiler will make finitely many moves on each turn. Consider what it means for the Solver to win at the limit. It means, first of all, that there are no explicit repetitions in any row, column or sub-board. This requirement will be ensured since it is part of the rules for legal play not to violate it. Next, the Solver wants to ensure that every square has a label on it and that every label appears at least once in every row, every column and every sub-board. If we think of these as individual specific requirements, we have countably many requirements in all, and I claim that we can arrange that the Solver will simply satisfy the $n^{th}$ requirement on her $n^{th}$ play. Given any finite position, she can always find something to place in any given square, using a totally new label if need be. Given any finite position, any row and any particular label $k$, since can always find a place on that row to place the label, which has no conflict with any column or sub-board, since there are infinitely many to choose from and only finitely many conflicts. Similarly with columns and sub-boards. So each of the requirements can always be fulfilled one-at-a-time, and so in $\omega$ many moves she can produce a full solution.

The argument works equally well no matter who goes first or if the Spoiler makes arbitrary finite play, or indeed even infinite play, provided that the play is part of some global solution (perhaps a different one each time), since on each move the Solve can simply meet the requirement by using that solution at that stage.

An essentially similar argument works when $\kappa$ is uncountable, although now the play will proceed for $\kappa$ many steps. Assuming $\kappa^2=\kappa$, a consequence of the axiom of choice, there are $\kappa$ many requirements to meet, and the Solve can meet requirement $\alpha$ on the $\alpha^{th}$ move. If $\kappa$ is regular, we can again allow the Spoiler to make arbitrary size-less-than-$\kappa$ size moves, so that at any stage of play before $\kappa$ the position will still be size less than $\kappa$. (If $\kappa$ is singular, one can allow Spoiler to make finitely many moves at once or indeed even some uniform bounded size $\delta<\kappa$ many moves at once. $\Box$

I find it interesting to draw out the following aspect of the argument:

Observation. Every finite labeling of an infinite Sudoku board that does not yet explicitly violate the Sudoku condition can be extended to a global solution.

Similarly, any size less than $\kappa$ labeling that does not yet explicitly violate the Sudoku condition can be extended to a global solution of the $\kappa$-Sudoku board for any infinite cardinal $\kappa$.

What about asymmetric boards? It has come to my attention that people sometimes look at asymmetric Sudoku boards, whose sub-boards are not square, such as in the six-by-six Sudoku case. In general, one could take Sudoku boards to consist of a $\lambda\times\kappa$ array of sub-boards of size $\kappa\times\lambda$, where $\kappa$ and $\lambda$ are cardinals, not necessarily the same size and not necessarily both infinite or both finite. How does this affect the arguments I’ve given?

In the finite $(n\times m)\times (m\times n)$ case, if one of the numbers is even, then it seems to me that the reflection through the origin strategy works for the second player just as before. And if both are odd, then the first player can again play in the center square and use the mirror-complement strategy to trap the opponent. So that analysis will work fine.

In the case $(\kappa\times\lambda)\times(\lambda\times\kappa)$ where $\lambda\leq\kappa$ and $\kappa=\lambda\kappa$ is infinite, then the proof of the theorem seems to break, since if $\lambda<\kappa$, then with only $\lambda$ many moves, say putting a common symbol in each of the $\lambda$ many rectangles across a row, we can rule out that symbol in a fixed row. So this is a configuration of size less than $\kappa$ that cannot be extended to a full solution. For this reason, it seems likely to me that the Spoiler can win the Sudoko Solver-Spoiler game in the infinite asymmetric case.

Finally, let’s consider the Sudoku Solver-Spoiler game in the purely finite case, which actually is a very natural game, perhaps more natural than what I called the Sudoku game above. It seems to me that the Spoiler should be able to win the Solver-Spoiler game on any nontrivial finite board. But I don’t yet have an argument proving this. I asked a question on MathOverflow: The Sudoku game: Solver-Spoiler variation.

Different set theories are never bi-interpretable

I was fascinated recently to discover something I hadn’t realized about relative interpretability in set theory, and I’d like to share it here. Namely,

Different set theories extending ZF are never bi-interpretable!

For example, ZF and ZFC are not bi-interpretable, and neither are ZFC and ZFC+CH, nor ZFC and ZFC+$\neg$CH, despite the fact that all these theories are equiconsistent. The basic fact is that there are no nontrivial instances of bi-interpretation amongst the models of ZF set theory. This is surprising, and could even be seen as shocking, in light of the philosophical remarks one sometimes hears asserted in the philosophy of set theory that what is going on with the various set-theoretic translations from large cardinals to determinacy to inner model theory, to mention a central example, is that we can interpret between these theories and consequently it doesn’t much matter which context is taken as fundamental, since we can translate from one context to another without loss.

The bi-interpretation result shows that these interpretations do not and cannot rise to the level of bi-interpretations of theories — the most robust form of mutual relative interpretability — and consequently, the translations inevitably must involve a loss of information.

To be sure, set theorists classify the various set-theoretic principles and theories into a hierarchy, often organized by consistency strength or by other notions of interpretative power, using forcing or definable inner models. From any model of ZF, for example, we can construct a model of ZFC, and from any model of ZFC, we can construct models of ZFC+CH or ZFC+$\neg$CH and so on. From models with sufficient large cardinals we can construct models with determinacy or inner-model-theoretic fine structure and vice versa. And while we have relative consistency results and equiconsistencies and even mutual interpretations, we will have no nontrivial bi-interpretations.

(I had proved the theorem a few weeks ago in joint work with Alfredo Roque Freire, who is visiting me in New York this year. We subsequently learned, however, that this was a rediscovery of results that have evidently been proved independently by various authors. Albert Visser proves the case of PA in his paper, “Categories of theories and interpretations,” Logic in Tehran, 284–341, Lect. Notes Log., 26, Assoc. Symbol. Logic, La Jolla, CA, 2006, (pdf, see pp. 52-55). Ali Enayat gave a nice model-theoretic argument for showing specifically that ZF and ZFC are not bi-interpretable, using the fact that ZFC models can have no involutions in their automorphism groups, but ZF models can; and he proved the general version of the theorem, for ZF, second-order arithmetic $Z_2$ and second-order set theory KM in his 2016 article, A. Enayat, “Variations on a Visserian theme,” in Liber Amicorum Alberti : a tribute to Albert Visser / Jan van Eijck, Rosalie Iemhoff and Joost J. Joosten (eds.) Pages, 99-110. ISBN, 978-1848902046. College Publications, London. The ZF version was apparently also observed independently by Harvey Friedman, Visser and Fedor Pakhomov.)

Meanwhile, let me explain our argument. Recall from model theory that one theory $S$ is interpreted in another theory $T$, if in any model of the latter theory $M\models T$, we can define (and uniformly so in any such model) a certain domain $N\subset M^k$ and relations and functions on that domain so as to make $N$ a model of $S$. For example, the theory of algebraically closed fields of characteristic zero is interpreted in the theory of real-closed fields, since in any real-closed field $R$, we can consider pairs $(a,b)$, thinking of them as $a+bi$, and define addition and multiplication on those pairs in such a way so as to construct an algebraically closed field of characteristic zero.

Two theories are thus mutually interpretable, if each of them is interpretable in the other. Such theories are necessarily equiconsistent, since from any model of one of them we can produce a model of the other.

Note that mutual interpretability, however, does not insist that the two translations are inverse to each other, even up to isomorphism. One can start with a model of the first theory $M\models T$ and define the interpreted model $N\models S$ of the second theory, which has a subsequent model of the first theory again $\bar M\models T$ inside it. But the definition does not insist on any particular connection between $M$ and $\bar M$, and these models need not be isomorphic nor even elementarily equivalent in general.

By addressing this, one arrives at a stronger and more robust form of mutual interpretability. Namely, two theories $S$ and $T$ are bi-interpretable, if they are mutually interpretable in such a way that the models can see that the interpretations are inverse. That is, for any model $M$ of the theory $T$, if one defines the interpreted model $N\models S$ inside it, and then defines the interpreted model $\bar M$ of $T$ inside $N$, then $M$ is isomorphic to $\bar M$ by a definable isomorphism in $M$, and uniformly so (and the same with the theories in the other direction). Thus, every model of one of the theories can see exactly how it itself arises definably in the interpreted model of the other theory.

For example, the theory of linear orders $\leq$ is bi-interpretable with the theory of strict linear order $<$, since from any linear order $\leq$ we can define the corresponding strict linear order $<$ on the same domain, and from any strict linear order $<$ we can define the corresponding linear order $\leq$, and doing it twice brings us back again to the same order.

For a richer example, the theory PA is bi-interpretable with the finite set theory $\text{ZF}^{\neg\infty}$, where one drops the infinity axiom from ZF and replaces it with the negation of infinity, and where one has the $\in$-induction scheme in place of the foundation axiom. The interpretation is via the Ackerman encoding of hereditary finite sets in arithmetic, so that $n\mathrel{E} m$ just in case the $n^{th}$ binary digit of $m$ is $1$. If one starts with the standard model $\mathbb{N}$, then the resulting structure $\langle\mathbb{N},E\rangle$ is isomorphic to the set $\langle\text{HF},\in\rangle$ of hereditarily finite sets. More generally, by carrying out the Ackermann encoding in any model of PA, one thereby defines a model of $\text{ZF}^{\neg\infty}$, whose natural numbers are isomorphic to the original model of PA, and these translations make a bi-interpretation.

We are now ready to prove that this bi-interpretation situation does not occur with different set theories extending ZF.

Theorem. Distinct set theories extending ZF are never bi-interpretable. Indeed, there is not a single model-theoretic instance of bi-interpretation occurring with models of different set theories extending ZF.

Proof. I mean “distinct” here in the sense that the two theories are not logically equivalent; they do not have all the same theorems. Suppose that we have a bi-interpretation instance of the theories $S$ and $T$ extending ZF. That is, suppose we have a model $\langle M,\in\rangle\models T$ of the one theory, and inside $M$, we can define an interpreted model of the other theory $\langle N,\in^N\rangle\models S$, so the domain of $N$ is a definable class in $M$ and the membership relation $\in^N$ is a definable relation on that class in $M$; and furthermore, inside $\langle N,\in^N\rangle$, we have a definable structure $\langle\bar M,\in^{\bar M}\rangle$ which is a model of $T$ again and isomorphic to $\langle M,\in^M\rangle$ by an isomorphism that is definable in $\langle M,\in^M\rangle$. So $M$ can define the map $a\mapsto \bar a$ that forms an isomorphism of $\langle M,\in^M\rangle$ with $\langle \bar M,\in^{\bar M}\rangle$. Our argument will work whether we allow parameters in any of these definitions or not.

I claim that $N$ must think the ordinals of $\bar M$ are well-founded, for otherwise it would have some bounded cut $A$ in the ordinals of $\bar M$ with no least upper bound, and this set $A$ when pulled back pointwise by the isomorphism of $M$ with $\bar M$ would mean that $M$ has a cut in its own ordinals with no least upper bound; but this cannot happen in ZF.

If the ordinals of $N$ and $\bar M$ are isomorphic in $N$, then all three models have isomorphic ordinals in $M$, and in this case, $\langle M,\in^M\rangle$ thinks that $\langle N,\in^N\rangle$ is a well-founded extensional relation of rank $\text{Ord}$. Such a relation must be set-like (since there can be no least instance where the predecessors form a proper class), and so $M$ can perform the Mostowski collapse of $\in^N$, thereby realizing $N$ as a transitive class $N\subseteq M$ with $\in^N=\in^M\upharpoonright N$. Similarly, by collapsing we may assume $\bar M\subseteq N$ and $\in^{\bar M}=\in^M\upharpoonright\bar M$. So the situation consists of inner models $\bar M\subseteq N\subseteq M$ and $\langle \bar M,\in^M\rangle$ is isomorphic to $\langle M,\in^M\rangle$ in $M$. This is impossible unless all three models are identical, since a simple $\in^M$-induction shows that $\pi(y)=y$ for all $y$, because if this is true for the elements of $y$, then $\pi(y)=\{\pi(x)\mid x\in y\}=\{x\mid x\in y\}=y$. So $\bar M=N=M$ and so $N$ and $M$ satisfy the same theory, contrary to assumption.

If the ordinals of $\bar M$ are isomorphic to a proper initial segment of the ordinals of $N$, then a similar Mostowski collapse argument would show that $\langle\bar M,\in^{\bar M}\rangle$ is isomorphic in $N$ to a transitive set in $N$. Since this structure in $N$ would have a truth predicate in $N$, we would be able to pull this back via the isomorphism to define (from parameters) a truth predicate for $M$ in $M$, contrary to Tarski’s theorem on the non-definability of truth.

The remaining case occurs when the ordinals of $N$ are isomorphic in $N$ to an initial segment of the ordinals of $\bar M$. But this would mean that from the perspective of $M$, the model $\langle N,\in^N\rangle$ has some ordinal rank height, which would mean by the Mostowski collapse argument that $M$ thinks $\langle N,\in^N\rangle$ is isomorphic to a transitive set. But this contradicts the fact that $M$ has an injection of $M$ into $N$. $\Box$

It follows that although ZF and ZFC are equiconsistent, they are not bi-interpretable. Similarly, ZFC and ZFC+CH and ZFC+$\neg$CH are equiconsistent, but no pair of them is bi-interpretable. And again with all the various equiconsistency results concerning large cardinals.

A similar argument works with PA to show that different extensions of PA are never bi-interpretable.

Nonstandard models of arithmetic arise in the complex numbers

I’d like to explain that one may find numerous nonstandard models of arithmetic as substructures of the field of complex numbers.

The issue arose yesterday at Hans Schoutens’s talk for the CUNY Logic Workshop. The main focus of the talk was the question, for a given algebraically closed field $k$ of characteristic zero and a given model of arithmetic $\Gamma\models$PA, whether $\Gamma$ and $k$ were jointly realizable as the set of powers (as he defines it) and the set of units of a model $S$ of the generalized theory of polynomial rings over fields. Very interesting stuff.

During the talk, a side question arose, concerning exactly which models of PA arise as substructures of the field of complex numbers.

Question. Which models of PA arise as substructures of the field of complex numbers $\langle\mathbb{C},+,\cdot\rangle$?

Of course the standard model $\mathbb{N}$ arises this way, and some people thought at first it should be difficult to realize nonstandard models of PA as substructures of $\mathbb{C}$. After some back and forth, the question was ultimately answered by Alfred Dolich in the pub after the seminar, and I’d like to give his argument here (but see the Mlček reference below).  This is a case where a problem that was initially confusing becomes completely clear!

Theorem. Every model of PA of size at most continuum arises as a sub-semiring of the field of complex numbers $\langle\mathbb{C},+,\cdot\rangle$.

Proof. Suppose that $M$ is a model of PA of size at most continuum. Inside $M$, we may form $M$’s version of the algebraic numbers $A=\bar{\mathbb{Q}}^M$, the field that $M$ thinks is the algebraic closure of its version of the rationals. So $A$ is an algebraically closed field of characteristic zero, which has an elementary extension to such a field of size continuum. Since the theory of algebraically closed fields of characteristic zero is categorical in all uncountable powers, it follows that $A$ is isomorphic to a submodel of $\mathbb{C}$. Since $M$ itself is isomorphic to a substructure of its rationals $\mathbb{Q}^M$, which sit inside $A$, it follows that $M$ is isomorphic to a substructure of $\mathbb{C}$, as claimed. QED

In particular, every countable model of PA can be found as a substructure of the complex numbers.

Essentially the same argument shows the following.

Theorem. If $k$ is an uncountable algebraically closed field of characteristic zero, then every model of arithmetic $M\models$PA of size at most the cardinality of $k$ embeds into $k$.

I’ve realized that the same collection of ideas shows the following striking way to look upon the complex numbers:

Theorem. The complex numbers $\mathbb{C}$ can be viewed as a nonstandard version of the algebraic numbers $\bar{\mathbb{Q}}^M$ inside a nonstandard model $M$ of PA. Indeed, for every uncountable algebraically closed field $F$ of characteristic zero and every model of arithmetic $M\models$PA of the same cardinality, the field $F$ is isomorphic to the nonstandard algebraic numbers $\bar{\mathbb{Q}}^M$ as $M$ sees them.

Proof. Fix any such field $F$, such as the complex numbers themselves, and consider any nonstandard model of arithmetic $M$ of the same cardinality. The field $\bar{\mathbb{Q}}^M$, which is $M$’s nonstandard version of the algebraic numbers, is an algebraically closed field of characteristic zero and same uncountable size as $F$. By categoricity, these fields are isomorphic. $\Box$

I had suspected that these results were folklore in the model-theoretic community, and it has come to my attention that proper credit for the main observation of this post seems to be due to Jozef Mlček, who proved it in 1973. Thanks to Jerome Tauber for the reference, which he provided in the comments.

Discussion of McCallum’s paper on Reinhardt cardinals in ZF

Update: Rupert has withdrawn his claim. See the final bullet point below.

Rupert McCallum has posted a new paper to the mathematics arXiv

Rupert McCallum, The choiceless cardinals are inconsistent, mathematics arXiv 2017: 1712.09678.

He is claiming to establish the Kunen inconsistency in ZF, without the axiom of choice, which is a long-standing open question. In particular, this would refute the Reinhardt cardinals in ZF and all the stronger ZF large cardinals that have been studied.

If correct, this result will constitute a central advance in large cardinal set theory.

I am making this post to provide a place to discuss the proof and any questions that people might have about it. Please feel free to post comments with questions or answers to other questions that have been posted. I will plan to periodically summarize things in the main body of this post as the discussion proceeds.

  • My first question concerns lemma 0.4, where he claims that $j’\upharpoonright V_{\lambda+2}^N$ is a definable class in $N$. He needs this to get the embedding into $N$, but I don’t see why the embedding should be definable here.
  • I wrote to Rupert about this concern, and he replied that it may be an issue, and that he intends to post a new version of his paper, where he may retreat to the weaker claim refuting only the super-Reinhardt cardinals.
  • The updated draft is now available. Follow the link above. It will become also available on the arXiv later this week.
  • The second January 2 draft has a new section claiming again the original refutation of Reinhardt cardinals.
  • New draft January 3. Rupert has reportedly been in communication with Matteo Viale about his result.
  • Rupert has announced (Jan 3) that he is going to take a week or so to produce a careful rewrite.
  • He has made available his new draft, January 7. It will also be posted on the arXiv.
  • January 8:  In light of the issues identified on this blog, especially the issue mentioned by Gabe, Rupert has sent me an email stating (and asking me to post here) that he is planning to think it through over the next couple of weeks and will then make some kind of statement about whether he thinks he can save the argument.  For the moment, therefore, it seems that we should consider the proof to be on hold.
  • January 24: After consideration, Rupert has withdrawn the claim, sending me the following message:

    “Gabriel has very kindly given me extensive feedback on many different drafts. I attach the latest version which he commented on [January 24 draft above]. He has identified the flaw, namely that on page 3 I claim that $\exists n \forall Y \in W_n \psi(Y)$ if and only if $\forall Y \in U \psi(Y)$. This claim is not justified, and this means that there is no way that is apparent to me to rescue the proof of Lemma 1.2. Gabriel has directed me to a paper of Laver which does indeed show that my mapping e is an elementary embedding but which does not give the stronger claim that I want.


    …So, I withdraw my claim. It is possible that this method of proof can work somehow, but some new insight is needed to make it work.”


    -Rupert McCallum, January 24, 2018


Here is an interesting game I heard a few days ago from one of my undergraduate students; I’m not sure of the provenance.

The game is played with stones on a grid, which extends indefinitely upward and to the right, like the lattice $\mathbb{N}\times\mathbb{N}$.  The game begins with three stones in the squares nearest the origin at the lower left.  The goal of the game is to vacate all stones from those three squares. At any stage of the game, you may remove a stone and replace it with two stones, one in the square above and one in the square to the right, provided that both of those squares are currently unoccupied.

For example, here is a sample play.

Question. Can you play so as completely to vacate the yellow corner region?

One needs only to move the other stones out of the way so that the corner stones have room to move out. Can you do it? It isn’t so easy, but I encourage you to try.

Here is an online version of the game that I coded up quickly in Scratch: Escape!

My student mentioned the problem to me and some other students in my office on the day of the final exam, and we puzzled over it, but then it was time for the final exam. So I had a chance to think about it while giving the exam and came upon a solution. I’ll post my answer later on, but I’d like to give everyone a chance to think about it first.

Solution. Here is the solution I hit upon, and it seems that many others also found this solution. The main idea is to assign an invariant to the game positions. Let us assign weights to the squares in the lattice according to the following pattern. We give the corner square weight $1/2$, the next diagonal of squares $1/4$ each, and then $1/8$, and so on throughout the whole playing board. Every square should get a corresponding weight according to the indicated pattern.

The weights are specifically arranged so that making a move in the game preserves the total weight of the occupied squares. That is, the total weight of the occupied squares is invariant as play proceeds, because moving a stone with weight $1/2^k$ will create two stones of weight $1/2^{k+1}$, which adds up to the same. Since the original three stones have total weight $\frac 12+\frac14+\frac14=1$, it follows that the total weight remains $1$ after every move in the game.

Meanwhile, let us consider the total weight of all the squares on the board. If you consider the bottom row only, the weights add to $\frac12+\frac14+\frac18+\cdots$, which is the geometric series with sum $1$. The next row has total weight $\frac14+\frac18+\frac1{16}+\cdots$, which adds to $1/2$. And the next adds to $1/4$ and so on. So the total weight of all the squares on the board is $1+\frac12+\frac14+\cdots$, which is $2$.  Since we have $k$ stones with weight $1/2^k$, another way to think about it is that we are essentially establishing the sum $\sum_k\frac k{2^k}=2$.

The subtle conclusion is that after any finite number of moves, only finitely many of those other squares are occupied, and so some of them remain empty. So after only finitely many moves, the total weight of the occupied squares off of the original L-shape is strictly less than $1$. Since the total weight of all the occupied squares is exactly $1$, this means that the L-shape has not been vacated.

So it is impossible to vacate the original L-shape in finitely many moves. $\Box$

Suppose that we relax the one-stone-per-square requirement, and allow you to stack several stones on a single square, provided that you eventually unstack them. In other words, can you play the stacked version of the game, so as to vacate the original three squares, provided that all the piled-up stones eventually are unstacked?

No, it is impossible! And the proof is the same invariant-weight argument as above. The invariance argument does not rely on the one-stone-per-square rule during play, since it is still an invariant if one multiplies the weight of a square by the number of stones resting upon it. So we cannot transform the original stones, with total weight $1$, to any finite number of stones on the rest of the board (with one stone per square in the final position), since those other squares do not have sufficient weight to add up to $1$, even if we allow them to be stacked during intermediate stages of play.

Meanwhile, let us consider playing the game on a finite $n\times n$ board, with the rule modified so that stones that would be created in row or column $n+1$ in the infinite game simply do not materialize in the $n\times n$ game. This breaks the proof, since the weight is no longer an invariant for moves on the outer edges. Furthermore, one can win this version of the game. It is easy to see that one can systematically vacate all stones on the upper and outer edges, simply by moving any of them that is available, pushing the remaining stones closer to the outer corner and into oblivion. Similarly, one can vacate the penultimate outer edges, by doing the same thing, which will push stones into the outer edges, which can then be vacated. By reverse induction from the outer edges in, one can vacate every single row and column. Thus, for play on this finite board with the modified rule on the outer edges, one can vacate the entire $n\times n$ board!

Indeed, in the finite $n\times n$ version of the game, there is no way to lose! If one simply continues making legal moves as long as this is possible, then the board will eventually be completely vacated. To see this, notice first that if there are stones on the board, then there is at least one legal move. Suppose that we can make an infinite sequence of legal moves on the $n\times n$ board. Since there are only finitely many squares, some of the squares must have been moved-upon infinitely often. If you consider such a square closest to the origin (or of minimal weight in the scheme of weights above), then since the lower squares are activated only finitely often, it is clear that eventually the given square will replenished for the last time. So it cannot have been activated infinitely often. (Alternatively, argue by induction on squares from the lower left that they are moved-upon at most finitely often.) Indeed, I claim that the number of steps to win, vacating the $n\times n$ board, does not depend on the order of play. One can see this by thinking about the path of a given stone and its clones through the board, ignoring the requirement that a given square carries only one stone. That is, let us make all the moves in parallel time. Since there is no interaction between the stones that would otherwise interfere, it is clear that the number of stones appearing on a given square in total is independent of the order of play. A tenacious person could calculate the sum exactly: each square is becomes occupied by a number of stones that is equal to the number of grid paths to it from one of the original three stones, and one could use this sum to calculate the total length of play on the $n\times n$ board.

Still don’t know, an epistemic logic puzzle

Here is a epistemic logic puzzle that I wrote for my students in the undergraduate logic course I have been teaching this semester at the College of Staten Island at CUNY.  We had covered some similar puzzles in lecture, including Cheryl’s Birthday and the blue-eyed islanders.

Bonus Question. Suppose that Alice and Bob are each given a different fraction, of the form $\frac{1}{n}$, where $n$ is a positive integer, and it is commonly known to them that they each know only their own number and that it is different from the other one. The following conversation ensues.


JDH: I privately gave you each a different rational number of the form $\frac{1}{n}$. Who has the larger number?

Alice: I don’t know.

Bob: I don’t know either.

Alice: I still don’t know.

Bob: Suddenly, now I know who has the larger number.

Alice: In that case, I know both numbers.

What numbers were they given?

Give the problem a try! See the solution posted below.

Meanwhile, for a transfinite epistemic logic challenge — considerably more difficult — see my puzzle Cheryl’s rational gifts.












When Alice says she doesn’t know, in her first remark, the meaning is exactly that she doesn’t have $\frac 11$, since that is only way she could have known who had the larger number.  When Bob replies after this that he doesn’t know, then it must be that he also doesn’t have $\frac 11$, but also that he doesn’t have $\frac 12$, since in either of these cases he would know that he had the largest number, but with any other number, he couldn’t be sure. Alice replies to this that she still doesn’t know, and the content of this remark is that Alice has neither $\frac 12$ nor $\frac 13$, since in either of these cases, and only in these cases, she would have known who has the larger number. Bob replies that suddenly, he now knows who has the larger number. The only way this could happen is if he had either $\frac 13$ or $\frac 14$, since in either of these cases he would have the larger number, but otherwise he wouldn’t know whether Alice had $\frac 14$ or not. But we can’t be sure yet whether Bob has $\frac 13$ or $\frac 14$. When Alice says that now she knows both numbers, however, then it must be because the information that she has allows her to deduce between the two possibilities for Bob. If she had $\frac 15$ or smaller, she wouldn’t be able to distinguish the two possibilities for Bob. Since we already ruled out $\frac 13$ for her, she must have $\frac 14$. So Alice has $\frac 14$ and Bob has $\frac 13$.

Many of the commentators came to the same conclusion. Congratulations to all who solved the problem! See also the answers posted on my math.stackexchange question and on Twitter:

Epistemic logic puzzle: Still Don’t Know.

Second-order transfinite recursion is equivalent to Kelley-Morse set theory over GBC

A few years ago, I had observed after hearing a talk by Benjamin Rin that the principle of first-order transfinite recursion for set well-orders is equivalent to the replacement axiom over Zermelo set theory, and thus we may take transfinite recursion as a fundamental set-theoretic principle, one which yields full ZFC when added to Zermelo’s weaker theory (plus foundation).

In later work, Victoria Gitman and I happened to prove that the principle of elementary transfinite recursion ETR, which allows for first-order class recursions along proper class well-orders (not necessarily set-like) is equivalent to the principle of determinacy for clopen class games [1]. Thus, once again, a strong recursion principle exhibited robustness as a fundamental set-theoretic principle.

The theme continued in recent joint work on the class forcing theorem, in which Victoria Gitman, myself, Peter Holy, Philipp Schlicht and Kameryn Williams [2] proved that the principle $\text{ETR}_{\text{Ord}}$, which allows for first-order class recursions of length $\text{Ord}$, is equivalent to twelve natural set-theoretic principles, including the existence of forcing relations for class forcing notions, the existence of Boolean completions for class partial orders, the existence of various kinds of truth predicates for infinitary logics, the existence of $\text{Ord}$-iterated truth predicates, and to the principle of determinacy for clopen class games of rank at most $\text{Ord}+1$.

A few days ago, a MathOverflow question of Alec Rhea’s — Is there a stronger form of recursion? — led me to notice that one naturally gains additional strength by pushing the recursion principles further into second-order set theory.

So let me introduce the second-order recursion principle STR and make the comparatively simple observation that over Gödel-Bernays GBC set theory this is equivalent to Kelley-Morse set theory KM. Thus, we may take this kind of recursion as a fundamental set-theoretic principle.

Definition. In the context of second-order set theory, the principle of second-order transfinite recursion, denoted STR, asserts of any formula $\varphi$ in the second-order language of set theory, that if $\Gamma=\langle I,\leq_\Gamma\rangle$ is any class well-order and $Z$ is any class parameter, then there is a class $S\subset I\times V$ that is a solution of the recursion, in the sense that
$$S_i=\{\ x\ \mid\  \varphi(x,S\upharpoonright i,Z)\ \}$$
for every $i\in I$, where where $S_i=\{\ x\ \mid\ (i,x)\in S\ \}$ is the section on coordinate $i$ and where $S\upharpoonright i=\{\ (j,x)\in S\ \mid\ j<_\Gamma i\ \}$ is the part of the solution at stages below $i$ with respect to $\Gamma$.

Theorem. The principle of second-order transfinite recursion STR is equivalent over GBC to the second-order comprehension principle. In other words, GBC+STR is equivalent to KM.

Proof. Kelley-Morse set theory proves that every second-order recursion has a solution in the same way that ZFC proves that every set-length well-ordered recursion has a solution. Namely, we simply consider the classes which are partial solutions to the recursion, in that they obey the recursive requirement, but possibly only on an initial segment of the well-order $\Gamma$. We may easily show by induction that any two such partial solutions agree on their common domain (this uses second-order comprehension in order to find the least point of disagreement, if any), and we can show that any given partial solution, if not already a full solution, can be extended to a partial solution on a strictly longer initial segment. Finally, we show that the common values of all partial solutions is therefore a solution of the recursion. This final step uses second-order comprehension in order to define what the common values are for the partial solutions to the recursion.

Conversely, the principle of second-order transfinite recursion clearly implies the second-order comprehension axiom, by considering recursions of length one. For any second-order assertion $\varphi$ and class parameter $Z$, we may deduce that $\{x\mid \varphi(x,Z)\}$ is a class, and so the second-order class comprehension principle holds. $\Box$

It is natural to consider various fragments of STR, such as $\Sigma^1_n\text{-}\text{TR}_\Gamma$, which is the assertion that every $\Sigma^1_n$-formula $\varphi$ admits a solution for recursions of length $\Gamma$.  Such principles are provable in proper fragments of KM, since for a given level of complexity, we only need a corresponding fragment of comprehension to undertake the proof that the recursion has a solution. The full STR asserts $\Sigma^1_\omega\text{-}\text{TR}$, allowing any length. The theorem shows that STR is equivalent to recursions of length $1$, since once you get the second-order comprehension principle, then you get solutions for recursions of any length. Thus, with second-order transfinite recursion, a little goes a long way. Perhaps it is more natural to think of transfinite recursion in this context not as axiomatizing KM, since it clearly implies second-order comprehension straight away, but rather as an apparent strengthening of KM that is actually provable in KM. This contrasts with the first-order situation of ETR with respect to GBC, where GBC+ETR does make a proper strengthening of GBC.

    • V. Gitman and J. D. Hamkins, “Open determinacy for class games,” in Foundations of Mathematics, Logic at Harvard, Essays in Honor of Hugh Woodin’s 60th Birthday, A. E. Caicedo, J. Cummings, P. Koellner, and P. Larson, Eds., , 2016, Newton Institute preprint ni15064.
      author = {Victoria Gitman and Joel David Hamkins},
      title = {Open determinacy for class games},
      booktitle = {{Foundations of Mathematics, Logic at Harvard, Essays in Honor of Hugh Woodin's 60th Birthday}},
      publisher = {},
      year = {2016},
      editor = {Andr\'es E. Caicedo and James Cummings and Peter Koellner and Paul Larson},
      volume = {},
      number = {},
      series = {AMS Contemporary Mathematics},
      type = {},
      chapter = {},
      pages = {},
      address = {},
      edition = {},
      month = {},
      note = {Newton Institute preprint ni15064},
      url = {},
      eprint = {1509.01099},
      archivePrefix = {arXiv},
      primaryClass = {math.LO},
      abstract = {},
      keywords = {},
    • V. Gitman, J. D. Hamkins, P. Holy, P. Schlicht, and K. Williams, “The exact strength of the class forcing theorem,” Journal of Symbolic Logic, 2020.
      author = {Victoria Gitman and Joel David Hamkins and Peter Holy and Philipp Schlicht and Kameryn Williams},
      title = {The exact strength of the class forcing theorem},
      journal = {Journal of Symbolic Logic},
      year = {2020},
      month = {},
      volume = {},
      number = {},
      pages = {},
      note = {},
      abstract = {},
      keywords = {to-appear},
      source = {},
      doi = {},
      eprint = {1707.03700},
      archivePrefix = {arXiv},
      primaryClass = {math.LO},
      url = {},
Photo by Petar Milošević (Own work) [CC BY-SA 4.0], via Wikimedia Commons

The universal definition — it can define any mathematical object you like, in the right set-theoretic universe

Alex_wong_2015_(Unsplash) (1)In set theory, we have the phenomenon of the universal definition. This is a property $\phi(x)$, first-order expressible in the language of set theory, that necessarily holds of exactly one set, but which can in principle define any particular desired set that you like, if one should simply interpret the definition in the right set-theoretic universe. So $\phi(x)$ could be defining the set of real numbes $x=\mathbb{R}$ or the integers $x=\mathbb{Z}$ or the number $x=e^\pi$ or a certain group or a certain topological space or whatever set you would want it to be. For any mathematical object $a$, there is a set-theoretic universe in which $a$ is the unique object $x$ for which $\phi(x)$.

The universal definition can be viewed as a set-theoretic analogue of the universal algorithm, a topic on which I have written several recent posts:

Let’s warm up with the following easy instance.

Theorem. Any particular real number $r$ can become definable in a forcing extension of the universe.

Proof. By Easton’s theorem, we can control the generalized continuum hypothesis precisely on the regular cardinals, and if we start (by forcing if necessary) in a model of GCH, then there is a forcing extension where $2^{\aleph_n}=\aleph_{n+1}$ just in case the $n^{th}$ binary digit of $r$ is $1$. In the resulting forcing extension $V[G]$, therefore, the real $r$ is definable as: the real whose binary digits conform with the GCH pattern on the cardinals $\aleph_n$. QED

Since this definition can be settled in a rank-initial segment of the universe, namely, $V_{\omega+\omega}$, the complexity of the definition is $\Delta_2$. See my post on Local properties in set theory to see how I think about locally verifiable and locally decidable properties in set theory.

If we push the argument just a little, we can go beyond the reals.

Theorem. There is a formula $\psi(x)$, of complexity $\Sigma_2$, such that for any particular object $a$, there is a forcing extension of the universe in which $\psi$ defines $a$.

Proof. Fix any set $a$. By the axiom of choice, we may code $a$ with a set of ordinals $A\subset\kappa$ for some cardinal $\kappa$. (One well-orders the transitive closure of $\{a\}$ and thereby finds a bijection $\langle\mathop{tc}(\{a\}),\in\rangle\cong\langle\kappa,E\rangle$ for some $E\subset\kappa\times\kappa$, and then codes $E$ to a set $A$ by an ordinal pairing function. The set $A$ tells you $E$, which tells you $\mathop{tc}(\{a\})$ by the Mostowski collapse, and from this you find $a$.) By Easton’s theorem, there is a forcing extension $V[G]$ in which the GCH holds at all $\aleph_{\lambda+1}$ for a limit ordinal $\lambda<\kappa$, but fails at $\aleph_{\kappa+1}$, and such that $\alpha\in A$ just in case $2^{\aleph_{\alpha+2}}=\aleph_{\alpha+3}$ for $\alpha<\kappa$. That is, we manipulate the GCH pattern to exactly code both $\kappa$ and the elements of $A\subset\kappa$. Let $\phi(x)$ assert that $x$ is the set that is decoded by this process: look for the first stage where the GCH fails at $\aleph_{\lambda+2}$, and then extract the set $A$ of ordinals, and then check if $x$ is the set coded by $A$. The assertion $\phi(x)$ did not depend on $a$, and since it can be verified in any sufficiently large $V_\theta$, the assertion $\phi(x)$ has complexity $\Sigma_2$. QED

Let’s try to make a better universal definition. As I mentioned at the outset, I have been motivated to find a set-theoretic analogue of the universal algorithm, and in that computable context, we had a universal algorithm that could not only produce any desired finite set, when run in the right universe, but which furthermore had a robust interaction between models of arithmetic and their top-extensions: any set could be extended to any other set for which the algorithm enumerated it in a taller universe. Here, I’d like to achieve the same robustness of interaction with the universal definition, as one moves from one model of set theory to a taller model. We say that one model of set theory $N$ is a top-extension of another $M$, if all the new sets of $N$ have rank totally above the ranks occuring in $M$. Thus, $M$ is a rank-initial segment of $N$. If there is a least new ordinal $\beta$ in $N\setminus M$, then this is equivalent to saying that $M=V_\beta^N$.

Theorem. There is a formula $\phi(x)$, such that

  1. In any model of ZFC, there is a unique set $a$ satisfying $\phi(a)$.
  2. For any countable model $M\models\text{ZFC}$ and any $a\in M$, there is a top-extension $N$ of $M$ such that $N\models \phi(a)$.

Thus, $\phi(x)$ is the universal definition: it always defines some set, and that set can be any desired set, even when moving from a model $M$ to a top-extension $N$.

Proof. The previous manner of coding will not achieve property 2, since the GCH pattern coding started immediately, and so it would be preserved to any top extension. What we need to do is to place the coding much higher in the universe, so that in the top extension $N$, it will occur in the part of $N$ that is totally above $M$.

But consider the following process. In any model of set theory, let $\phi(x)$ assert that $x$ is the empty set unless the GCH holds at all sufficiently large cardinals, and indeed $\phi(x)$ is false unless there is a cardinal $\delta$ and ordinal $\gamma<\delta^+$ such that the GCH holds at all cardinals above $\aleph_{\delta+\gamma}$. In this case, let $\delta$ be the smallest such cardinal for which that is true, and let $\gamma$ be the smallest ordinal working with this $\delta$. So both $\delta$ and $\gamma$ are definable. Now, let $A\subset\gamma$ be the set of ordinals $\alpha$ for which the GCH holds at $\aleph_{\delta+\alpha+1}$, and let $\phi(x)$ assert that $x$ is the set coded by the set $A$.

It is clear that $\phi(x)$ defines a unique set, in any model of ZFC, and so (1) holds. For (2), suppose that $M$ is a countable model of ZFC and $a\in M$. It is a fact that every countable model of ZFC has a top-extension, by the definable ultrapower method. Let $N_0$ be a top extension of $M$. Let $N=N_0[G]$ be a forcing extension of $N_0$ in which the set $a$ is coded into the GCH pattern very high up, at cardinals totally above $M$, and such that the GCH holds above this coding, in such a way that the process described in the previous paragraph would define exactly the set $a$. So $\phi(a)$ holds in $N$, which is a top-extension of $M$ as no new sets of small rank are added by the forcing. So statement (2) also holds. QED

The complexity of the definition is $\Pi_3$, mainly because in order to know where to look for the coding, one needs to know the ordinals $\delta$ and $\gamma$, and so one needs to know that the GCH always holds above that level. This is a $\Pi_3$ property, since it cannot be verified locally only inside some $V_\theta$.

A stronger analogue with the universal algorithm — and this is a question that motivated my thinking about this topic — would be something like the following:

Question. Is there is a $\Sigma_2$ formula $\varphi(x)$, that is, a locally verifiable property, with the following properties?

  1. In any model of ZFC, the class $\{x\mid\varphi(x)\}$ is a set.
  2. It is consistent with ZFC that $\{x\mid\varphi(x)\}$ is empty.
  3. In any countable model $M\models\text{ZFC}$ in which $\{x\mid\varphi(x)\}=a$ and any set $b\in M$ with $a\subset b$, then there is a top-extension $N$ of $M$ in which $\{x\mid\varphi(x)\}=b$.

An affirmative answer would be a very strong analogue with the universal algorithm and Woodin’s theorem about which I wrote previously. The idea is that the $\Sigma_2$ properties $\varphi(x)$ in set theory are analogous to the computably enumerable properties in computability theory. Namely, to verify that an object has a certain computably enumerable property, we run a particular computable process and then sit back, waiting for the process to halt, until a stage of computation arrives at which the property is verified. Similarly, in set theory, to verify that a set has a particular $\Sigma_2$ property, we sit back watching the construction of the cumulative set-theoretic universe, until a stage $V_\beta$ arrives that provides verification of the property. This is why in statement (3) we insist that $a\subset b$, since the $\Sigma_2$ properties are always upward absolute to top-extensions; once an object is placed into $\{x\mid\varphi(x)\}$, then it will never be removed as one makes the universe taller.

So the hope was that we would be able to find such a universal $\Sigma_2$ definition, which would serve as a set-theoretic analogue of the universal algorithm used in Woodin’s theorem.

If one drops the first requirement, and allows $\{x\mid \varphi(x)\}$ to sometimes be a proper class, then one can achieve a positive answer as follows.

Theorem. There is a $\Sigma_2$ formula $\varphi(x)$ with the following properties.

  1. If the GCH holds, then $\{x\mid\varphi(x)\}$ is empty.
  2. For any countable model $M\models\text{ZFC}$ where $a=\{x\mid \varphi(x)\}$ and any $b\in M$ with $a\subset b$, there is a top extension $N$ of $M$ in which $N\models\{x\mid\varphi(x)\}=b$.

Proof. Let $\varphi(x)$ assert that the set $x$ is coded into the GCH pattern. We may assume that the coding mechanism of a set is marked off by certain kinds of failures of the GCH at odd-indexed alephs, with the pattern at intervening even-indexed regular cardinals forming the coding pattern.  This is $\Sigma_2$, since any large enough $V_\theta$ will reveal whether a given set $x$ is coded in this way. And because of the manner of coding, if the GCH holds, then no set is coded. Also, if the GCH holds eventually, then only a set-sized collection is coded. Finally, any countable model $M$ where only a set is coded can be top-extended to another model $N$ in which any desired superset of that set is coded. QED

Update.  Originally, I had proposed an argument for a negative answer to the question, and I was actually a bit disappointed by that, since I had hoped for a positive answer. However, it now seems to me that the argument I had written is wrong, and I am grateful to Ali Enayat for his remarks on this in the comments. I have now deleted the incorrect argument.

Meanwhile, here is a positive answer to the question in the case of models of $V\neq\newcommand\HOD{\text{HOD}}\HOD$.

Theorem. There is a $\Sigma_2$ formula $\varphi(x)$ with the following properties:

  1. In any model of $\newcommand\ZFC{\text{ZFC}}\ZFC+V\neq\HOD$, the class $\{x\mid\varphi(x)\}$ is a set.
  2. It is relatively consistent with $\ZFC$ that $\{x\mid\varphi(x)\}$ is empty; indeed, in any model of $\ZFC+\newcommand\GCH{\text{GCH}}\GCH$, the class $\{x\mid\varphi(x)\}$ is empty.
  3. If $M\models\ZFC$ thinks that $a=\{x\mid\varphi(x)\}$ is a set and $b\in M$ is a larger set $a\subset b$, then there is a top-extension $N$ of $M$ in which $\{x\mid \varphi(x)\}=b$.

Proof. Let $\varphi(x)$ hold, if there is some ordinal $\alpha$ such that every element of $V_\alpha$ is coded into the GCH pattern below some cardinal $\delta_\alpha$, with $\delta_\alpha$ as small as possible with that property, and $x$ is the next set coded into the GCH pattern above $\delta_\alpha$. This is a $\Sigma_2$ property, since it can be verified in any sufficiently large $V_\theta$.

In any model of $\ZFC+V\neq\HOD$, there must be some sets that are no coded into the $\GCH$ pattern, for if every set is coded that way then there would be a definable well-ordering of the universe and we would have $V=\HOD$. So in any model of $V\neq\HOD$, there is a bound on the ordinals $\alpha$ for which $\delta_\alpha$ exists, and therefore $\{x\mid\varphi(x)\}$ is a set. So statement (1) holds.

Statement (2) holds, because we may arrange it so that the GCH itself implies that no set is coded at all, and so $\varphi(x)$ would always fail.

For statement (3), suppose that $M\models\ZFC+\{x\mid\varphi(x)\}=a\subseteq b$ and $M$ is countable. In $M$, there must be some minimal rank $\alpha$ for which there is a set of rank $\alpha$ that is not coded into the GCH pattern. Let $N$ be an elementary top-extension of $M$, so $N$ agrees that $\alpha$ is that minimal rank. Now, by forcing over $N$, we can arrange to code all the sets of rank $\alpha$ into the GCH pattern above the height of the original model $M$, and we can furthermore arrange so as to code any given element of $b$ just above that coding. And so on, we can iterate it so as to arrange the coding above the height of $M$ so that exactly the elements of $b$ now satisfy $\varphi(x)$, but no more. In this way, we will ensure that $N\models\{x\mid\varphi(x)\}=b$, as desired. QED

I find the situation unusual, in that often results from the models-of-arithmetic context generalize to set theory with models of $V=\HOD$, because the global well-order means that models of $V=\HOD$ have definable Skolem functions, which is true in every model of arithmetic and which sometimes figures implicitly in constructions. But here, we have the result of Woodin’s theorem generalizing from models of arithmetic to models of $V\neq\HOD$.  Perhaps this suggests that we should expect a fully positive solution for models of set theory.

Further update. Woodin and I have now established the fully general result of the universal finite set, which subsumes much of the preliminary early analysis that I had earlier made in this post. Please see my post, The universal finite set.

The universal algorithm: a new simple proof of Woodin’s theorem

This is the third in a series of posts I’ve made recently concerning what I call the universal algorithm, which is a program that can in principle compute any function, if only you should run it in the right universe. Earlier, I had presented a few elementary proofs of this surprising theorem: see Every function can be computable! and A program that accepts exactly any desired finite set, in the right universe.

$\newcommand\PA{\text{PA}}$Those arguments established the universal algorithm, but they fell short of proving Woodin’s interesting strengthening of the theorem, which explains how the universal algorithm can be extended from any arithmetic universe to a larger one, in such a way so as to extend the given enumerated sequence in any desired manner. Woodin emphasized how his theorem raises various philosophical issues about the absoluteness or rather the non-absoluteness of finiteness, which I find extremely interesting.

Woodin’s proof, however, is a little more involved than the simple arguments I provided for the universal algorithm alone. Please see the paper Blanck, R., and Enayat, A. Marginalia on a theorem of Woodin, The Journal of Symbolic Logic, 82(1), 359-374, 2017. doi:10.1017/jsl.2016.8 for a further discussion of Woodin’s argument and related results.

What I’ve recently discovered, however, is that in fact one can prove Woodin’s stronger version of the theorem using only the method of the elementary argument. This variation also allows one to drop the countability requirement on the models, as was done by Blanck and Enayat. My thinking on this argument was greatly influenced by a comment of Vadim Kosoy on my original post.

It will be convenient to adopt an enumeration model of Turing computability, by which we view a Turing machine program as providing a means to computably enumerate a list of numbers. We start the program running, and it generates a list of numbers, possibly finite, possibly infinite, possibly empty, possibly with repetition. This way of using Turing machines is fully Turing equivalent to the usual way, if one simply imagines enumerating input/output pairs so as to code any given computable partial function.

Theorem.(Woodin) There is a Turing machine program $e$ with the following properties.

  1. $\PA$ proves that $e$ enumerates a finite sequence of numbers.
  2. For any finite sequence $s$, there is a model $M$ of $\PA$ in which program $e$ enumerates exactly $s$.
  3. For any model $M$ in which $e$ enumerates a (possibly nonstandard) sequence $s$ and any $t\in M$ extending $s$, there is an end-extension $N$ of $M$ in which $e$ enumerates exactly $t$.

It is statement (3) that makes this theorem stronger than merely the universal algorithm that I mentioned in my earlier posts and which I find particularly to invite philosophical speculation on the provisional nature of finiteness. After all, if in one universe the program $e$ enumerates a finite sequence $s$, then for any $t$ extending $s$ — we might imagine having painted some new pattern $t$ on top of $s$ — there is a taller universe in which $e$ enumerates exactly $t$. So we need only wait long enough (into the next universe), and then our program $e$ will enumerate exactly the sequence $t$ we had desired.

Proof. This is the new elementary proof.  Let’s begin by recalling the earlier proof of the universal algorithm, for statements (1) and (2) only. Namely, let $e$ be the program that undertakes a systematic exhaustive search through all proofs from $\PA$ for a proof of a statement of the form, “program $e$ does not enumerate exactly the sequence $s$,” where $s$ is an explicitly listed finite sequence of numbers. Upon finding such a proof (the first such proof found), it proceeds to enumerate exactly the numbers appearing in $s$.  Thus, at bottom, the program $e$ is a petulant child: it searches for a proof that it shouldn’t behave in a certain way, and then proceeds at once to behave in exactly the forbidden manner.

(The reader may notice an apparent circularity in the definition of program $e$, since we referred to $e$ when defining $e$. But this is no problem at all, and it is a standard technique in computability theory to use the Kleene recursion theorem to show that this kind of definition is completely fine. Namely, we really define a program $f(e)$ that performs that task, asking about $e$, and then by the recursion theorem, there is a program $e$ such that $e$ and $f(e)$ compute the same function, provably so. And so for this fixed-point program $e$, it is searching for proofs about itself.)

It is clear that the program $e$ will enumerate a finite list of numbers only, since either it never finds the sought-after proof, in which case it enumerates nothing, and the empty sequence is finite, or else it does find a proof, in which case it enumerates exactly the finitely many numbers explicitly appearing in the statement that was proved. So $\PA$ proves that in any case $e$ enumerates a finite list. Further, if $\PA$ is consistent, then you will not be able to refute any particular finite sequence being enumerated by $e$, because if you could, then (for the smallest such instance) the program $e$ would in fact enumerate exactly those numbers, and this would be provable, contradicting $\text{Con}(\PA)$. Precisely because you cannot refute that statement, it follows that the theory $\PA$ plus the assertion that $e$ enumerates exactly $s$ is consistent, for any particular $s$. So there is a model $M$ of $\PA$ in which program $e$ enumerates exactly $s$. This establishes statements (1) and (2) for this program.

Let me now modify the program in order to achieve the key third property. Note that the program described above definitely does not have property (3), since once a nonempty sequence $s$ is enumerated, then the program is essentially finished, and so running it in a taller universe $N$ will not affect the sequence it enumerates. To achieve (3), therefore, we modify the program by allowing it to add more to the sequence.

Specfically, for the new modified version of the program $e$, we start as before by searching for a proof in $\PA$ that the list enumerated by $e$ is not exactly some explicitly listed finite sequence $s$. When this proof is found, then $e$ immediately enumerates the numbers appearing in $s$. Next, it inspects the proof that it had found. Since the proof used only finitely many $\PA$ axioms, it is therefore a proof from a certain fragment $\PA_n$, the $\Sigma_n$ fragment of $\PA$. Now, the algorithm $e$ continues by searching for a proof in a strictly smaller fragment that program $e$ does not enumerate exactly some explicitly listed sequence $t$ properly extending the sequence of numbers already enumerated. When such a proof is found, it then immediately enumerates (the rest of) those numbers. And now simply iterate this, looking for new proofs in still-smaller fragments of $\PA$ that a still-longer extension is not the sequence enumerated by $e$.

Succinctly: the program $e$ searches for a proof, in a strictly smaller fragment of $\PA$ each time, that $e$ does not enumerate exactly a certain explicitly listed sequence $s$ extending whatever has been already enumerated so far, and when found, it enumerates those new elements, and repeats.

We can still prove in $\PA$ that $e$ enumerates a finite sequence, since the fragment of $\PA$ that is used each time is going down, and $\PA$ proves that this can happen only finitely often. So statement (1) holds.

Again, you cannot refute that any particular finite sequence $s$ is the sequence enumerated by $e$, since if you could do this, then in the standard model, the program would eventually find such a proof, and then perhaps another and another, until ultimately, it would find some last proof that $e$ does not enumerate exactly some finite sequence $t$, at which time the program will have enumerated exactly $t$ and never afterward add to it. So that proof would have proved a false statement. This is a contradiction, since that proof is standard.

So again, precisely because you cannot refute these statements, it follows that it is consistent with $\PA$ that program $e$ enumerates exactly $s$, for any particular finite $s$. So statement (2) holds.

Finally, for statement (3), suppose that $M$ is a model of $\PA$ in which $e$ enumerates exactly some finite sequence $s$. If $s$ is the empty sequence, then $M$ thinks that there is no proof in $\PA$ that $e$ does not enumerate exactly $t$, for any particular $t$. And so it thinks the theory $\PA+$ “$e$ enumerates exactly $t$” is consistent. So in $M$ we may build the Henkin model $N$ of this theory, which is an end-extension of $M$ in which $e$ enumerates exactly $t$, as desired.

If alternatively $s$ was nonempty, then it was enumerated by $e$ in $M$ because in $M$ there was ultimately a proof in some fragment $\PA_n$ that it should not do so, but it never found a corresponding proof about an extension of $s$ in any strictly smaller fragment of $\PA$.  So $M$ has a proof from $\PA_n$ that $e$ does not enumerate exactly $s$, even though it did.

Notice that $n$ must be nonstandard, because $M$ has a definable $\Sigma_k$-truth predicate for every standard $k$, and using this predicate, $M$ can see that every $\PA_k$-provable statement must be true.

Precisely because the model $M$ lacked the proofs from the strictly smaller fragment $\PA_{n-1}$, it follows that for any particular finite $t$ extending $s$ in $M$, the model thinks that the theory $T=\PA_{n-1}+$ “$e$ enumerates exactly $t$” is consistent. Since $n$ is nonstandard, this theory includes all the actual $\PA$ axioms. In $M$ we can build the Henkin model $N$ of this theory, which will be an end-extension of $M$ in which $\PA$ holds and program $e$ enumerates exactly $t$, as desired for statement (3). QED

Corollary. Let $e$ be the universal algorithm program $e$ of the theorem. Then

  1. For any infinite sequence $S:\mathbb{N}\to\mathbb{N}$, there is a model $M$ of $\PA$ in which program $e$ enumerates a (nonstandard finite) sequence starting with $S$.
  2. If $M$ is any model of $\PA$ in which program $e$ enumerates some (possibly nonstandard) finite sequence $s$, and $S$ is any $M$-definable infinite sequence extending $s$, then there is an end-extension of $M$ in which $e$ enumerates a sequence starting with $S$.

Proof. (1) Fix $S:\mathbb{N}\to\mathbb{N}$. By a simple compactness argument, there is a model $M$ of true arithmetic in which the sequence $S$ is the standard part of some coded nonstandard finite sequence $t$. By the main theorem, there is some end-extension $M^+$ of $M$ in which $e$ enumerates $t$, which extends $S$, as desired.

(2) If $e$ enumerates $s$ in $M$, a model of $\PA$, and $S$ is an $M$-infinite sequence definable in $M$, then by a compactness argument inside $M$, we can build a model $M’$ inside $M$ in which $S$ is coded by an element, and then apply the main theorem to find a further end-extension $M^+$ in which $e$ enumerates that element, and hence which enumerates an extension of $S$. QED