The universal definition — it can define any mathematical object you like, in the right set-theoretic universe

Alex_wong_2015_(Unsplash) (1)In set theory, we have the phenomenon of the universal definition. This is a property $\phi(x)$, first-order expressible in the language of set theory, that necessarily holds of exactly one set, but which can in principle define any particular desired set that you like, if one should simply interpret the definition in the right set-theoretic universe. So $\phi(x)$ could be defining the set of real numbes $x=\mathbb{R}$ or the integers $x=\mathbb{Z}$ or the number $x=e^\pi$ or a certain group or a certain topological space or whatever set you would want it to be. For any mathematical object $a$, there is a set-theoretic universe in which $a$ is the unique object $x$ for which $\phi(x)$.

The universal definition can be viewed as a set-theoretic analogue of the universal algorithm, a topic on which I have written several recent posts:

Let’s warm up with the following easy instance.

Theorem. Any particular real number $r$ can become definable in a forcing extension of the universe.

Proof. By Easton’s theorem, we can control the generalized continuum hypothesis precisely on the regular cardinals, and if we start (by forcing if necessary) in a model of GCH, then there is a forcing extension where $2^{\aleph_n}=\aleph_{n+1}$ just in case the $n^{th}$ binary digit of $r$ is $1$. In the resulting forcing extension $V[G]$, therefore, the real $r$ is definable as: the real whose binary digits conform with the GCH pattern on the cardinals $\aleph_n$. QED

Since this definition can be settled in a rank-initial segment of the universe, namely, $V_{\omega+\omega}$, the complexity of the definition is $\Delta_2$. See my post on Local properties in set theory to see how I think about locally verifiable and locally decidable properties in set theory.

If we push the argument just a little, we can go beyond the reals.

Theorem. There is a formula $\psi(x)$, of complexity $\Sigma_2$, such that for any particular object $a$, there is a forcing extension of the universe in which $\psi$ defines $a$.

Proof. Fix any set $a$. By the axiom of choice, we may code $a$ with a set of ordinals $A\subset\kappa$ for some cardinal $\kappa$. (One well-orders the transitive closure of $\{a\}$ and thereby finds a bijection $\langle\mathop{tc}(\{a\}),\in\rangle\cong\langle\kappa,E\rangle$ for some $E\subset\kappa\times\kappa$, and then codes $E$ to a set $A$ by an ordinal pairing function. The set $A$ tells you $E$, which tells you $\mathop{tc}(\{a\})$ by the Mostowski collapse, and from this you find $a$.) By Easton’s theorem, there is a forcing extension $V[G]$ in which the GCH holds at all $\aleph_{\lambda+1}$ for a limit ordinal $\lambda<\kappa$, but fails at $\aleph_{\kappa+1}$, and such that $\alpha\in A$ just in case $2^{\alpha_{\alpha+2}}=\aleph_{\alpha+3}$ for $\alpha<\kappa$. That is, we manipulate the GCH pattern to exactly code both $\kappa$ and the elements of $A\subset\kappa$. Let $\phi(x)$ assert that $x$ is the set that is decoded by this process: look for the first stage where the GCH fails at $\aleph_{\lambda+2}$, and then extract the set $A$ of ordinals, and then check if $x$ is the set coded by $A$. The assertion $\phi(x)$ did not depend on $a$, and since it can be verified in any sufficiently large $V_\theta$, the assertion $\phi(x)$ has complexity $\Sigma_2$. QED

Let’s try to make a better universal definition. As I mentioned at the outset, I have been motivated to find a set-theoretic analogue of the universal algorithm, and in that computable context, we had a universal algorithm that could not only produce any desired finite set, when run in the right universe, but which furthermore had a robust interaction between models of arithmetic and their top-extensions: any set could be extended to any other set for which the algorithm enumerated it in a taller universe. Here, I’d like to achieve the same robustness of interaction with the universal definition, as one moves from one model of set theory to a taller model. We say that one model of set theory $N$ is a top-extension of another $M$, if all the new sets of $N$ have rank totally above the ranks occuring in $M$. Thus, $M$ is a rank-initial segment of $N$. If there is a least new ordinal $\beta$ in $N\setminus M$, then this is equivalent to saying that $M=V_\beta^N$.

Theorem. There is a formula $\phi(x)$, such that

  1. In any model of ZFC, there is a unique set $a$ satisfying $\phi(a)$.
  2. For any countable model $M\models\text{ZFC}$ and any $a\in M$, there is a top-extension $N$ of $M$ such that $N\models \phi(a)$.

Thus, $\phi(x)$ is the universal definition: it always defines some set, and that set can be any desired set, even when moving from a model $M$ to a top-extension $N$.

Proof. The previous manner of coding will not achieve property 2, since the GCH pattern coding started immediately, and so it would be preserved to any top extension. What we need to do is to place the coding much higher in the universe, so that in the top extension $N$, it will occur in the part of $N$ that is totally above $M$.

But consider the following process. In any model of set theory, let $\phi(x)$ assert that $x$ is the empty set unless the GCH holds at all sufficiently large cardinals, and indeed $\phi(x)$ is false unless there is a cardinal $\delta$ and ordinal $\gamma<\delta^+$ such that the GCH holds at all cardinals above $\aleph_{\delta+\gamma}$. In this case, let $\delta$ be the smallest such cardinal for which that is true, and let $\gamma$ be the smallest ordinal working with this $\delta$. So both $\delta$ and $\gamma$ are definable. Now, let $A\subset\gamma$ be the set of ordinals $\alpha$ for which the GCH holds at $\aleph_{\delta+\alpha+1}$, and let $\phi(x)$ assert that $x$ is the set coded by the set $A$.

It is clear that $\phi(x)$ defines a unique set, in any model of ZFC, and so (1) holds. For (2), suppose that $M$ is a countable model of ZFC and $a\in M$. It is a fact that every countable model of ZFC has a top-extension, by the definable ultrapower method. Let $N_0$ be a top extension of $M$. Let $N=N_0[G]$ be a forcing extension of $N_0$ in which the set $a$ is coded into the GCH pattern very high up, at cardinals totally above $M$, and such that the GCH holds above this coding, in such a way that the process described in the previous paragraph would define exactly the set $a$. So $\phi(a)$ holds in $N$, which is a top-extension of $M$ as no new sets of small rank are added by the forcing. So statement (2) also holds. QED

The complexity of the definition is $\Pi_3$, mainly because in order to know where to look for the coding, one needs to know the ordinals $\delta$ and $\gamma$, and so one needs to know that the GCH always holds above that level. This is a $\Pi_3$ property, since it cannot be verified locally only inside some $V_\theta$.

A stronger analogue with the universal algorithm — and this is a question that motivated my thinking about this topic — would be something like the following:

Question. Is there is a $\Sigma_2$ formula $\varphi(x)$, that is, a locally verifiable property, with the following properties?

  1. In any model of ZFC, the class $\{x\mid\varphi(x)\}$ is a set.
  2. It is consistent with ZFC that $\{x\mid\varphi(x)\}$ is empty.
  3. In any countable model $M\models\text{ZFC}$ in which $\{x\mid\varphi(x)\}=a$ and any set $b\in M$ with $a\subset b$, then there is a top-extension $N$ of $M$ in which $\{x\mid\varphi(x)\}=b$.

An affirmative answer would be a very strong analogue with the universal algorithm and Woodin’s theorem about which I wrote previously. The idea is that the $\Sigma_2$ properties $\varphi(x)$ in set theory are analogous to the computably enumerable properties in computability theory. Namely, to verify that an object has a certain computably enumerable property, we run a particular computable process and then sit back, waiting for the process to halt, until a stage of computation arrives at which the property is verified. Similarly, in set theory, to verify that a set has a particular $\Sigma_2$ property, we sit back watching the construction of the cumulative set-theoretic universe, until a stage $V_\beta$ arrives that provides verification of the property. This is why in statement (3) we insist that $a\subset b$, since the $\Sigma_2$ properties are always upward absolute to top-extensions; once an object is placed into $\{x\mid\varphi(x)\}$, then it will never be removed as one makes the universe taller.

So the hope was that we would be able to find such a universal $\Sigma_2$ definition, which would serve as a set-theoretic analogue of the universal algorithm used in Woodin’s theorem.

If one drops the first requirement, and allows $\{x\mid \varphi(x)\}$ to sometimes be a proper class, then one can achieve a positive answer as follows.

Theorem. There is a $\Sigma_2$ formula $\varphi(x)$ with the following properties.

  1. If the GCH holds, then $\{x\mid\varphi(x)\}$ is empty.
  2. For any countable model $M\models\text{ZFC}$ where $a=\{x\mid \varphi(x)\}$ and any $b\in M$ with $a\subset b$, there is a top extension $N$ of $M$ in which $N\models\{x\mid\varphi(x)\}=b$.

Proof. Let $\varphi(x)$ assert that the set $x$ is coded into the GCH pattern. We may assume that the coding mechanism of a set is marked off by certain kinds of failures of the GCH at odd-indexed alephs, with the pattern at intervening even-indexed regular cardinals forming the coding pattern.  This is $\Sigma_2$, since any large enough $V_\theta$ will reveal whether a given set $x$ is coded in this way. And because of the manner of coding, if the GCH holds, then no set is coded. Also, if the GCH holds eventually, then only a set-sized collection is coded. Finally, any countable model $M$ where only a set is coded can be top-extended to another model $N$ in which any desired superset of that set is coded. QED

Update.  Originally, I had proposed an argument for a negative answer to the question, and I was actually a bit disappointed by that, since I had hoped for a positive answer. However, it now seems to me that the argument I had written is wrong, and I am grateful to Ali Enayat for his remarks on this in the comments. I have now deleted the incorrect argument.

Meanwhile, here is a positive answer to the question in the case of models of $V\neq\newcommand\HOD{\text{HOD}}\HOD$.

Theorem. There is a $\Sigma_2$ formula $\varphi(x)$ with the following properties:

  1. In any model of $\newcommand\ZFC{\text{ZFC}}\ZFC+V\neq\HOD$, the class $\{x\mid\varphi(x)\}$ is a set.
  2. It is relatively consistent with $\ZFC$ that $\{x\mid\varphi(x)\}$ is empty; indeed, in any model of $\ZFC+\newcommand\GCH{\text{GCH}}\GCH$, the class $\{x\mid\varphi(x)\}$ is empty.
  3. If $M\models\ZFC$ thinks that $a=\{x\mid\varphi(x)\}$ is a set and $b\in M$ is a larger set $a\subset b$, then there is a top-extension $N$ of $M$ in which $\{x\mid \varphi(x)\}=b$.

Proof. Let $\varphi(x)$ hold, if there is some ordinal $\alpha$ such that every element of $V_\alpha$ is coded into the GCH pattern below some cardinal $\delta_\alpha$, with $\delta_\alpha$ as small as possible with that property, and $x$ is the next set coded into the GCH pattern above $\delta_\alpha$. This is a $\Sigma_2$ property, since it can be verified in any sufficiently large $V_\theta$.

In any model of $\ZFC+V\neq\HOD$, there must be some sets that are no coded into the $\GCH$ pattern, for if every set is coded that way then there would be a definable well-ordering of the universe and we would have $V=\HOD$. So in any model of $V\neq\HOD$, there is a bound on the ordinals $\alpha$ for which $\delta_\alpha$ exists, and therefore $\{x\mid\varphi(x)\}$ is a set. So statement (1) holds.

Statement (2) holds, because we may arrange it so that the GCH itself implies that no set is coded at all, and so $\varphi(x)$ would always fail.

For statement (3), suppose that $M\models\ZFC+\{x\mid\varphi(x)\}=a\subseteq b$ and $M$ is countable. In $M$, there must be some minimal rank $\alpha$ for which there is a set of rank $\alpha$ that is not coded into the GCH pattern. Let $N$ be an elementary top-extension of $M$, so $N$ agrees that $\alpha$ is that minimal rank. Now, by forcing over $N$, we can arrange to code all the sets of rank $\alpha$ into the GCH pattern above the height of the original model $M$, and we can furthermore arrange so as to code any given element of $b$ just above that coding. And so on, we can iterate it so as to arrange the coding above the height of $M$ so that exactly the elements of $b$ now satisfy $\varphi(x)$, but no more. In this way, we will ensure that $N\models\{x\mid\varphi(x)\}=b$, as desired. QED

I find the situation unusual, in that often results from the models-of-arithmetic context generalize to set theory with models of $V=\HOD$, because the global well-order means that models of $V=\HOD$ have definable Skolem functions, which is true in every model of arithmetic and which sometimes figures implicitly in constructions. But here, we have the result of Woodin’s theorem generalizing from models of arithmetic to models of $V\neq\HOD$.  Perhaps this suggests that we should expect a fully positive solution for models of set theory.

The universal algorithm: a new simple proof of Woodin’s theorem

This is the third in a series of posts I’ve made recently concerning what I call the universal algorithm, which is a program that can in principle compute any function, if only you should run it in the right universe. Earlier, I had presented a few elementary proofs of this surprising theorem: see Every function can be computable! and A program that accepts exactly any desired finite set, in the right universe.

$\newcommand\PA{\text{PA}}$Those arguments established the universal algorithm, but they fell short of proving Woodin’s interesting strengthening of the theorem, which explains how the universal algorithm can be extended from any arithmetic universe to a larger one, in such a way so as to extend the given enumerated sequence in any desired manner. Woodin emphasized how his theorem raises various philosophical issues about the absoluteness or rather the non-absoluteness of finiteness, which I find extremely interesting.

Woodin’s proof, however, is a little more involved than the simple arguments I provided for the universal algorithm alone. Please see the paper Blanck, R., and Enayat, A. Marginalia on a theorem of Woodin, The Journal of Symbolic Logic, 82(1), 359-374, 2017. doi:10.1017/jsl.2016.8 for a further discussion of Woodin’s argument and related results.

What I’ve recently discovered, however, is that in fact one can prove Woodin’s stronger version of the theorem using only the method of the elementary argument. This variation also allows one to drop the countability requirement on the models, as was done by Blanck and Enayat. My thinking on this argument was greatly influenced by a comment of Vadim Kosoy on my original post.

It will be convenient to adopt an enumeration model of Turing computability, by which we view a Turing machine program as providing a means to computably enumerate a list of numbers. We start the program running, and it generates a list of numbers, possibly finite, possibly infinite, possibly empty, possibly with repetition. This way of using Turing machines is fully Turing equivalent to the usual way, if one simply imagines enumerating input/output pairs so as to code any given computable partial function.

Theorem.(Woodin) There is a Turing machine program $e$ with the following properties.

  1. $\PA$ proves that $e$ enumerates a finite sequence of numbers.
  2. For any finite sequence $s$, there is a model $M$ of $\PA$ in which program $e$ enumerates exactly $s$.
  3. For any model $M$ in which $e$ enumerates a (possibly nonstandard) sequence $s$ and any $t\in M$ extending $s$, there is an end-extension $N$ of $M$ in which $e$ enumerates exactly $t$.

It is statement (3) that makes this theorem stronger than merely the universal algorithm that I mentioned in my earlier posts and which I find particularly to invite philosophical speculation on the provisional nature of finiteness. After all, if in one universe the program $e$ enumerates a finite sequence $s$, then for any $t$ extending $s$ — we might imagine having painted some new pattern $t$ on top of $s$ — there is a taller universe in which $e$ enumerates exactly $t$. So we need only wait long enough (into the next universe), and then our program $e$ will enumerate exactly the sequence $t$ we had desired.

Proof. This is the new elementary proof.  Let’s begin by recalling the earlier proof of the universal algorithm, for statements (1) and (2) only. Namely, let $e$ be the program that undertakes a systematic exhaustive search through all proofs from $\PA$ for a proof of a statement of the form, “program $e$ does not enumerate exactly the sequence $s$,” where $s$ is an explicitly listed finite sequence of numbers. Upon finding such a proof (the first such proof found), it proceeds to enumerate exactly the numbers appearing in $s$.  Thus, at bottom, the program $e$ is a petulant child: it searches for a proof that it shouldn’t behave in a certain way, and then proceeds at once to behave in exactly the forbidden manner.

(The reader may notice an apparent circularity in the definition of program $e$, since we referred to $e$ when defining $e$. But this is no problem at all, and it is a standard technique in computability theory to use the Kleene recursion theorem to show that this kind of definition is completely fine. Namely, we really define a program $f(e)$ that performs that task, asking about $e$, and then by the recursion theorem, there is a program $e$ such that $e$ and $f(e)$ compute the same function, provably so. And so for this fixed-point program $e$, it is searching for proofs about itself.)

It is clear that the program $e$ will enumerate a finite list of numbers only, since either it never finds the sought-after proof, in which case it enumerates nothing, and the empty sequence is finite, or else it does find a proof, in which case it enumerates exactly the finitely many numbers explicitly appearing in the statement that was proved. So $\PA$ proves that in any case $e$ enumerates a finite list. Further, if $\PA$ is consistent, then you will not be able to refute any particular finite sequence being enumerated by $e$, because if you could, then (for the smallest such instance) the program $e$ would in fact enumerate exactly those numbers, and this would be provable, contradicting $\text{Con}(\PA)$. Precisely because you cannot refute that statement, it follows that the theory $\PA$ plus the assertion that $e$ enumerates exactly $s$ is consistent, for any particular $s$. So there is a model $M$ of $\PA$ in which program $e$ enumerates exactly $s$. This establishes statements (1) and (2) for this program.

Let me now modify the program in order to achieve the key third property. Note that the program described above definitely does not have property (3), since once a nonempty sequence $s$ is enumerated, then the program is essentially finished, and so running it in a taller universe $N$ will not affect the sequence it enumerates. To achieve (3), therefore, we modify the program by allowing it to add more to the sequence.

Specfically, for the new modified version of the program $e$, we start as before by searching for a proof in $\PA$ that the list enumerated by $e$ is not exactly some explicitly listed finite sequence $s$. When this proof is found, then $e$ immediately enumerates the numbers appearing in $s$. Next, it inspects the proof that it had found. Since the proof used only finitely many $\PA$ axioms, it is therefore a proof from a certain fragment $\PA_n$, the $\Sigma_n$ fragment of $\PA$. Now, the algorithm $e$ continues by searching for a proof in a strictly smaller fragment that program $e$ does not enumerate exactly some explicitly listed sequence $t$ properly extending the sequence of numbers already enumerated. When such a proof is found, it then immediately enumerates (the rest of) those numbers. And now simply iterate this, looking for new proofs in still-smaller fragments of $\PA$ that a still-longer extension is not the sequence enumerated by $e$.

Succinctly: the program $e$ searches for a proof, in a strictly smaller fragment of $\PA$ each time, that $e$ does not enumerate exactly a certain explicitly listed sequence $s$ extending whatever has been already enumerated so far, and when found, it enumerates those new elements, and repeats.

We can still prove in $\PA$ that $e$ enumerates a finite sequence, since the fragment of $\PA$ that is used each time is going down, and $\PA$ proves that this can happen only finitely often. So statement (1) holds.

Again, you cannot refute that any particular finite sequence $s$ is the sequence enumerated by $e$, since if you could do this, then in the standard model, the program would eventually find such a proof, and then perhaps another and another, until ultimately, it would find some last proof that $e$ does not enumerate exactly some finite sequence $t$, at which time the program will have enumerated exactly $t$ and never afterward add to it. So that proof would have proved a false statement. This is a contradiction, since that proof is standard.

So again, precisely because you cannot refute these statements, it follows that it is consistent with $\PA$ that program $e$ enumerates exactly $s$, for any particular finite $s$. So statement (2) holds.

Finally, for statement (3), suppose that $M$ is a model of $\PA$ in which $e$ enumerates exactly some finite sequence $s$. If $s$ is the empty sequence, then $M$ thinks that there is no proof in $\PA$ that $e$ does not enumerate exactly $t$, for any particular $t$. And so it thinks the theory $\PA+$ “$e$ enumerates exactly $t$” is consistent. So in $M$ we may build the Henkin model $N$ of this theory, which is an end-extension of $M$ in which $e$ enumerates exactly $t$, as desired.

If alternatively $s$ was nonempty, then it was enumerated by $e$ in $M$ because in $M$ there was ultimately a proof in some fragment $\PA_n$ that it should not do so, but it never found a corresponding proof about an extension of $s$ in any strictly smaller fragment of $\PA$.  So $M$ has a proof from $\PA_n$ that $e$ does not enumerate exactly $s$, even though it did.

Notice that $n$ must be nonstandard, because $M$ has a definable $\Sigma_k$-truth predicate for every standard $k$, and using this predicate, $M$ can see that every $\PA_k$-provable statement must be true.

Precisely because the model $M$ lacked the proofs from the strictly smaller fragment $\PA_{n-1}$, it follows that for any particular finite $t$ extending $s$ in $M$, the model thinks that the theory $T=\PA_{n-1}+$ “$e$ enumerates exactly $t$” is consistent. Since $n$ is nonstandard, this theory includes all the actual $\PA$ axioms. In $M$ we can build the Henkin model $N$ of this theory, which will be an end-extension of $M$ in which $\PA$ holds and program $e$ enumerates exactly $t$, as desired for statement (3). QED

Corollary. Let $e$ be the universal algorithm program $e$ of the theorem. Then

  1. For any infinite sequence $S:\mathbb{N}\to\mathbb{N}$, there is a model $M$ of $\PA$ in which program $e$ enumerates a (nonstandard finite) sequence starting with $S$.
  2. If $M$ is any model of $\PA$ in which program $e$ enumerates some (possibly nonstandard) finite sequence $s$, and $S$ is any $M$-definable infinite sequence extending $s$, then there is an end-extension of $M$ in which $e$ enumerates a sequence starting with $S$.

Proof. (1) Fix $S:\mathbb{N}\to\mathbb{N}$. By a simple compactness argument, there is a model $M$ of true arithmetic in which the sequence $S$ is the standard part of some coded nonstandard finite sequence $t$. By the main theorem, there is some end-extension $M^+$ of $M$ in which $e$ enumerates $t$, which extends $S$, as desired.

(2) If $e$ enumerates $s$ in $M$, a model of $\PA$, and $S$ is an $M$-infinite sequence definable in $M$, then by a compactness argument inside $M$, we can build a model $M’$ inside $M$ in which $S$ is coded by an element, and then apply the main theorem to find a further end-extension $M^+$ in which $e$ enumerates that element, and hence which enumerates an extension of $S$. QED

Models of set theory with the same reals and the same cardinals, but which disagree on the continuum hypothesis

Terry_Marks,_Nightmare_in_a_MirrorI’d like to describe a certain interesting and surprising situation that can happen with models of set theory.

Theorem. If $\newcommand\ZFC{\text{ZFC}}\ZFC$ set theory is consistent, then there are two models of $\ZFC$ set theory $M$ and $N$ for which

  • $M$ and $N$ have the same real numbers $$\newcommand\R{\mathbb{R}}\R^M=\R^N.$$
  • $M$ and $N$ have the ordinals and the same cardinals $$\forall\alpha\qquad \aleph_\alpha^M=\aleph_\alpha^N$$
  • But $M$ thinks that the continuum hypothesis $\newcommand\CH{\text{CH}}\CH$ is true, while $N$ thinks that $\CH$ is false.

This is a little strange, since the two models have the set $\R$ in common and they agree on the cardinal numbers, but $M$ thinks that $\R$ has size $\aleph_1$ and $N$ will think that $\R$ has size $\aleph_2$.  In particular, $M$ can well-order the reals in order type $\omega_1$ and $N$ can do so in order-type $\omega_2$, even though the two models have the same reals and they agree that these order types have different cardinalities.

Another abstract way to describe what is going on is that even if two models of set theory, even transitive models, agree on which ordinals are cardinals, they needn’t agree on which sets are equinumerous, for sets they have in common, even for the reals.

Let me emphasize that it is the requirement that the models have the same cardinals that makes the problem both subtle and surprising. If you drop that requirement, then the problem is an elementary exercise in forcing: start with any model $V$, and first force $\CH$ to fail in $V[H]$ by adding a lot of Cohen reals, then force to $V[G]$ by collapsing the continuum to $\aleph_1$. This second step adds no new reals and forces $\CH$, and so $V[G]$ and $V[H]$ will have the same reals, while $V[H]$ thinks $\CH$ is true and $V[G]$ thinks $\CH$ is false. The problem becomes nontrivial and interesting mainly when you insist that cardinals are not collapsed.

In fact, the situation described in the theorem can be forced over any given model of $\ZFC$.

Theorem. Every model of set theory $V\models\ZFC$ has two set-forcing extensions $V[G]$ and $V[H]$ for which

  • $V[G]$ and $V[H]$ have the same real numbers $$\newcommand\R{\mathbb{R}}\R^{V[G]}=\R^{V[H]}.$$
  • $V[G]$ and $V[H]$ have the same cardinals $$\forall\alpha\qquad \aleph_\alpha^{V[G]}=\aleph_\alpha^{V[H]}$$
  • But $V[G]$ thinks that the continuum hypothesis $\CH$ is true, while $V[H]$ thinks that $\CH$ is false.

Proof. Start in any model $V\models\ZFC$, and by forcing if necessary, let’s assume $\CH$ holds in $V$. Let $H\subset\text{Add}(\omega,\omega_2)$ be $V$-generic for the forcing to add $\omega_2$ many Cohen reals. So $V[H]$ satisfies $\neg\CH$ and has the same ordinals and cardinals as $V$.

Next, force over $V[H]$ using the forcing from $V$ to collapse $\omega_2$ to $\omega_1$, forming the extension $V[H][g]$, where $g$ is the generic bijection between those ordinals. Since we used the forcing in $V$, which is countably closed there, it makes sense to consider $V[g]$.  In this extension, the forcing $\text{Add}(\omega,\omega_1^V)$ and $\text{Add}(\omega,\omega_2^V)$ are isomorphic. Since $H$ is $V[g]$-generic for the latter, let $G=g\mathrel{“}H$ be the image of this filter in $\text{Add}(\omega,\omega_1)$, which is therefore $V[g]$-generic for the former. So $V[g][G]=V[g][H]$. Since the forcing $\text{Add}(\omega,\omega_1)$ is c.c.c., it follows that $V[G]$ also has the same cardinals as $V$ and hence also the same as in $V[H]$.

If we now view these extensions as $V[G][g]=V[H][g]$ and note that the coutable closure of $g$ in $V$ implies that $g$ adds no new reals over either $V[G]$ or $V[H]$, it follows that $\R^{V[G]}=\R^{V[H]}$. So the two models have the same reals and the same cardinals. But $V[G]$ has $\CH$ and $V[H]$ has $\neg\CH$, in light of the forcing, and so the proof is complete. QED

Let me prove the following surprising generalization.

Theorem. If $V$ is any model of $\ZFC$ and $V[G]$ is the forcing extension obtained by adding $\kappa$ many Cohen reals, for some uncountable $\kappa$, then for any other uncountable cardinal $\lambda$, there is another forcing extension $V[H]$ where $H$ is $V$-generic for the forcing to add $\lambda$ many Cohen reals, yet $\R^{V[G]}=\R^{V[H]}$.

Proof. Start in $V[G]$, and let $g$ be $V[G]$-generic to collapse $\lambda$ to $\kappa$, using the collapse forcing of the ground model $V$. This forcing is countably closed in $V$ and therefore does not add reals over $V[G]$. In $V[g]$, the two forcing notions $\text{Add}(\omega,\kappa)$ and $\text{Add}(\omega,\lambda)$ are isomorphic. Thus, since $G$ is $V[g]$-generic for the former poset, it follows that the image $H=g\mathrel{“}G$ is $V[g]$-generic for the latter poset. So $V[H]$ is generic over $V$ for adding $\lambda$ many Cohen reals. By construction, we have $V[G][g]=V[H][g]$, and since $g$ doesn’t add reals, it follows that $\R^{V[G]}=\R^{V[H]}$, as desired. QED

I have a vague recollection of having first heard of this problem many years ago, perhaps as a graduate student, although I don’t quite recall where it was or indeed what the construction was — the argument above is my reconstruction (which I have updated and extended from my initial post). If someone could provide a reference in the comments for due credit, I’d be appreciative.  The problem appeared a few years ago on MathOverflow.

A program that accepts exactly any desired finite set, in the right universe

One_small_step_(3598325560)Last year I made a post about the universal program, a Turing machine program $p$ that can in principle compute any desired function, if it is only run inside a suitable model of set theory or arithmetic.  Specifically, there is a program $p$, such that for any function $f:\newcommand\N{\mathbb{N}}\N\to\N$, there is a model $M\models\text{PA}$ — or of $\text{ZFC}$, whatever theory you like — inside of which program $p$ on input $n$ gives output $f(n)$.

This theorem is related to a very interesting theorem of W. Hugh Woodin’s, which says that there is a program $e$ such that $\newcommand\PA{\text{PA}}\PA$ proves $e$ accepts only finitely many inputs, but such that for any finite set $A\subset\N$, there is a model of $\PA$ inside of which program $e$ accepts exactly the elements of $A$. Actually, Woodin’s theorem is a bit stronger than this in a way that I shall explain.

Victoria Gitman gave a very nice talk today on both of these theorems at the special session on Computability theory: Pushing the Boundaries at the AMS sectional meeting here in New York, which happens to be meeting right here in my east midtown neighborhood, a few blocks from my home.

What I realized this morning, while walking over to Vika’s talk, is that there is a very simple proof of the version of Woodin’s theorem stated above.  The idea is closely related to an idea of Vadim Kosoy mentioned in my post last year. In hindsight, I see now that this idea is also essentially present in Woodin’s proof of his theorem, and indeed, I find it probable that Woodin had actually begun with this idea and then modified it in order to get the stronger version of his result that I shall discuss below.

But in the meantime, let me present the simple argument, since I find it to be very clear and the result still very surprising.

Theorem. There is a Turing machine program $e$, such that

  1. $\PA$ proves that $e$ accepts only finitely many inputs.
  2. For any particular finite set $A\subset\N$, there is a model $M\models\PA$ such that inside $M$, the program $e$ accepts all and only the elements of $A$.
  3. Indeed, for any set $A\subset\N$, including infinite sets, there is a model $M\models\PA$ such that inside $M$, program $e$ accepts $n$ if and only if $n\in A$.

Proof.  The program $e$ simply performs the following task:  on any input $n$, search for a proof from $\PA$ of a statement of the form “program $e$ does not accept exactly the elements of $\{n_1,n_2,\ldots,n_k\}$.” Accept nothing until such a proof is found. For the first such proof that is found, accept $n$ if and only if $n$ is one of those $n_i$’s.

In short, the program $e$ searches for a proof that $e$ doesn’t accept exactly a certain finite set, and when such a proof is found, it accepts exactly the elements of this set anyway.

Clearly, $\PA$ proves that program $e$ accepts only a finite set, since either no such proof is ever found, in which case $e$ accepts nothing (and the empty set is finite), or else such a proof is found, in which case $e$ accepts only that particular finite set. So $\PA$ proves that $e$ accepts only finitely many inputs.

But meanwhile, assuming $\PA$ is consistent, then you cannot refute the assertion that program $e$ accepts exactly the elements of some particular finite set $A$, since if you could prove that from $\PA$, then program $e$ actually would accept exactly that set (for the shortest such proof), in which case this would also be provable, contradicting the consistency of $\PA$.

Since you cannot refute any particular finite set as the accepting set for $e$, it follows that it is consistent with $\PA$ that $e$ accepts any particular finite set $A$ that you like. So there is a model of $\PA$ in which $e$ accepts exactly the elements of $A$. This establishes statement (2).

Statement (3) now follows by a simple compactness argument. Namely, for any $A\subset\N$, let $T$ be the theory of $\PA$ together with the assertions that program $e$ accepts $n$, for any particular $n\in A$, and the assertions that program $e$ does not accept $n$, for $n\notin A$. Any finite subtheory of this theory is consistent, by statement (2), and so the whole theory is consistent. Any model of this theory realizes statement (3). QED

One uses the Kleene recursion theorem to show the existence of the program $e$, which makes reference to $e$ in the description of what it does. Although this may look circular, it is a standard technique to use the recursion theorem to eliminate the circularity.

This theorem immediately implies the classical result of Mostowski and Kripke that there is an independent family of $\Pi^0_1$ assertions, since the assertions $n\notin W_e$ are exactly such a family.

The theorem also implies a strengthening of the universal program theorem that I proved last year. Indeed, the two theorems can be realized with the same program!

Theorem. There is a Turing machine program $e$ with the following properties:

  1. $\PA$ proves that $e$ computes a finite function;
  2. For any particular finite partial function $f$ on $\N$, there is a model $M\models\PA$ inside of which program $e$ computes exactly $f$.
  3. For any partial function $f:\N\to\N$, finite or infinite, there is a model $M\models\PA$ inside of which program $e$ on input $n$ computes exactly $f(n)$, meaning that $e$ halts on $n$ if and only if $f(n)\downarrow$ and in this case $\varphi_e(n)=f(n)$.

Proof. The proof of statements (1) and (2) is just as in the earlier theorem. It is clear that $e$ computes a finite function, since either it computes the empty function, if no proof is found, or else it computes the finite function mentioned in the proof. And you cannot refute any particular finite function for $e$, since if you could, it would have exactly that behavior anyway, contradicting $\text{Con}(\PA)$. So statement (2) holds. But meanwhile, we can get statement (3) by a simple compactness argument. Namely, fix $f$ and let $T$ be the theory asserting $\PA$ plus all the assertions either that $\varphi_e(n)\uparrow$, if $n$ is not the domain of $f$, and $\varphi_e(n)=k$, if $f(n)=k$.  Every finite subtheory of this theory is consistent, by statement (2), and so the whole theory is consistent. But any model of this theory exactly fulfills statement (3). QED

Woodin’s proof is more difficult than the arguments I have presented, but I realize now that this extra difficulty is because he is proving an extremely interesting and stronger form of the theorem, as follows.

Theorem. (Woodin) There is a Turing machine program $e$ such that $\PA$ proves $e$ accepts at most a finite set, and for any finite set $A\subset\N$ there is a model $M\models\PA$ inside of which $e$ accepts exactly $A$. And furthermore, in any such $M$ and any finite $B\supset A$, there is an end-extension $M\subset_{end} N\models\PA$, such that in $N$, the program $e$ accepts exactly the elements of $B$.

This is a much more subtle claim, as well as philosophically interesting for the reasons that he dwells on.

The program I described above definitely does not achieve this stronger property, since my program $e$, once it finds the proof that $e$ does not accept exactly $A$, will accept exactly $A$, and this will continue to be true in all further end-extensions of the model, since that proof will continue to be the first one that is found.

Worldly cardinals are not always downwards absolute


UniversumI recently came to realize that worldly cardinals are not necessarily downward absolute to transitive inner models. That is, it can happen that a cardinal $\kappa$ is worldly in the full set-theoretic universe $V$, but not in some transitive inner model $W$, even when $W$ is itself a model of ZFC. The observation came out of some conversations I had with Alexander Block from Hamburg during his recent research visit to New York. Let me explain the argument.

A cardinal $\kappa$ is inaccessible, if it is an uncountable regular strong limit cardinal. The structure $V_\kappa$, consisting of the rank-initial segment of the set-theoretic universe up to $\kappa$, which can be generated from the empty set by applying the power set operation $\kappa$ many times, has many nice features. In particular, it is transitive model of $\newcommand\ZFC{\text{ZFC}}\ZFC$. The models $V_\kappa$ for $\kappa$ inaccessible are precisely the uncountable Grothendieck universes used in category theory.

Although the inaccessible cardinals are often viewed as the entryway to the large cardinal hierarchy, there is a useful large cardinal concept weaker than inaccessibility. Namely, a cardinal $\kappa$ is worldly, if $V_\kappa$ is a model of $\ZFC$. Every inaccessible cardinal is worldly, and in fact a limit of worldly cardinals, because if $\kappa$ is inaccessible, then there is an elementary chain of cardinals $\lambda<\kappa$ with $V_\lambda\prec V_\kappa$, and all such $\lambda$ are worldly. The regular worldly cardinals are precisely the inaccessible cardinals, but the least worldly cardinal is always singular of cofinality $\omega$.

The worldly cardinals can be seen as a kind of poor-man’s inaccessible cardinal, in that worldliness often suffices in place of inaccessibility in many arguments, and this sometimes allows one to weaken a large cardinal hypothesis. But meanwhile, they do have some significant strengths. For example, if $\kappa$ is worldly, then $V_\kappa$ satisfies the principle that every set is an element of a transitive model of $\ZFC$.

It is easy to see that inaccessibility is downward absolute, in the sense that if $\kappa$ is inaccessible in the full set-theoretic universe $V$ and $W\newcommand\of{\subseteq}\of V$ is a transitive inner model of $\ZFC$, then $\kappa$ is also inaccessible in $W$. The reason is that $\kappa$ cannot be singular in $W$, since any short cofinal sequence in $W$ would still exist in $V$; and it cannot fail to be a strong limit there, since if some $\delta<\kappa$ had $\kappa$-many distinct subsets in $W$, then this injection would still exist in $V$. So inaccessibility is downward absolute.

The various degrees of hyper-inaccessibility are also downwards absolute to inner models, so that if $\kappa$ is an inaccessible limit of inaccessible limits of inaccessible cardinals, for example, then this is also true in any inner model. This downward absoluteness extends all the way through the hyperinaccessibility hierarchy and up to the Mahlo cardinals and beyond. A cardinal $\kappa$ is Mahlo, if it is a strong limit and the regular cardinals below $\kappa$ form a stationary set. We have observed that being regular is downward absolute, and it is easy to see that every stationary set $S$ is stationary in every inner model, since otherwise there would be a club set $C$ disjoint from $S$ in the inner model, and this club would still be a club in $V$. Similarly, the various levels of hyper-Mahloness are also downward absolute.

So these smallish large cardinals are generally downward absolute. How about the worldly cardinals? Well, we can prove first off that worldliness is downward absolute to the constructible universe $L$.

Observation. If $\kappa$ is worldly, then it is worldly in $L$.

Proof. If $\kappa$ is worldly, then $V_\kappa\models\ZFC$. This implies that $\kappa$ is a beth-fixed point. The $L$ of $V_\kappa$, which is a model of $\ZFC$, is precisely $L_\kappa$, which is also the $V_\kappa$ of $L$, since $\kappa$ must also be a beth-fixed point in $L$. So $\kappa$ is worldly in $L$. QED

But meanwhile, in the general case, worldliness is not downward absolute.

Theorem. Worldliness is not necessarily downward absolute to all inner models. It is relatively consistent with $\ZFC$ that there is a worldly cardinal $\kappa$ and an inner model $W\of V$, such that $\kappa$ is not worldly in $W$.

Proof. Suppose that $\kappa$ is a singular worldly cardinal in $V$. And by forcing if necessary, let us assume the GCH holds in $V$. Let $V[G]$ be the forcing extension where we perform the Easton product forcing $\newcommand\P{\mathbb{P}}\P$, so as to force a violation of the GCH at every regular cardinal $\gamma$. So the stage $\gamma$ forcing is $\newcommand\Q{\mathbb{Q}}\Q_\gamma=\text{Add}(\gamma,\gamma^{++})$.

First, I shall prove that $\kappa$ is worldly in the forcing extension $V[G]$. Since every set of rank less than $\kappa$ is added by some stage less than $\kappa$, it follows that $V_\kappa^{V[G]}$ is precisely $\bigcup_{\gamma<\kappa} V_\kappa[G_\gamma]$. Most of the $\ZFC$ axioms hold easily in $V_\kappa^{V[G]}$; the only difficult case is the collection axiom. And for this, by considering the ranks of witnesses, it suffices to show for every $\gamma<\kappa$ that every function $f:\gamma\to\kappa$ that is definable from parameters in $V_\kappa^{V[G]}$ is bounded. Suppose we have such a function, defined by $f(\alpha)=\beta$ just in case $\varphi(\alpha,\beta,p)$ holds in $V_\kappa^{V[G]}$. Let $\delta<\kappa$ be larger than the rank of $p$. Now consider $V_\kappa[G_\delta]$, which is a set-forcing extension of $V_\kappa$ and therefore a model of $\ZFC$. The fail forcing, from stage $\delta$ up to $\kappa$, is homogeneous in this model. And therefore we know that $f(\alpha)=\beta$ just in case $1$ forces $\varphi(\check\alpha,\check\beta,\check p)$, since these arguments are all in the ground model $V_\kappa[G_\delta]$. So the function is already definable in $V_\kappa[G_\delta]$. Because this is a model of $\ZFC$, the function $f$ is bounded below $\kappa$. So we get the collection axiom in $V_\kappa^{V[G]}$ and hence all of $\ZFC$ there, and so $\kappa$ is worldly in $V[G]$.

For any $A\of\kappa$, let $\P_A$ be the restriction of the Easton product forcing to include only the stages in $A$, and let $G_A$ be the corresponding generic filter. The full forcing $\P$ factors as $\P_A\times\P_{\kappa\setminus A}$, and so $V[G_A]\of V[G]$ is a transitive inner model of $\ZFC$.

But if we pick $A\of\kappa$ to be a short cofinal set in $\kappa$, which is possible because $\kappa$ is singular, then $\kappa$ will not be worldly in the inner model $V[G_A]$, since in $V_\kappa[G_A]$ we will be able to identify that sequence as the places where the GCH fails. So $\kappa$ is not worldly in $V[G_A]$.

In summary, $\kappa$ was worldly in $V[G]$, but not in the transitive inner model $W=V[G_A]$, and so worldliness is not downward absolute. QED

The definable cut of a model of set theory can be changed by small forcing

Cupid carving his bow -- ParmigianinoIf $M$ is a model of ZFC set theory, let $I$ be the definable cut of its ordinals, the collection of ordinals that are below an ordinal $\delta$ of $M$ that is definable in $M$ without parameters. This would include all the ordinals of $M$, if the definable ordinals happen to be unbounded in $M$, but one can also construct examples where the definable cut is bounded in $M$.  Let $M_I$ be the corresponding definable cut of $M$ itself, the rank-initial segment of $M$ determined by $I$, or in other words, the collection of all sets $x$ in $M$ of rank below a definable ordinal of $M$. Equivalently, $$M_I=\bigcup_{\delta\in I} V_\delta^M.$$ It is not difficult to see that this is an elementary substructure $M_I\prec M$, because we can verify the Tarski-Vaught criterion as follows. If $M\models\exists y\ \varphi(x,y)$, where $x\in M_I$, then let $\delta$ be a definable ordinal above the rank of $x$. In this case, the ordinal $\theta$, which is the supremum over all $a\in V_\delta$ of the minimal rank of a set $y$ for which $\varphi(a,y)$, if there is such a $y$. This supremum $\theta$ is definable, and so since $x\in V_\delta$, the minimal rank of a $y$ such that $\varphi(x,y)$ is at most $\theta$. Consequently, since $\theta\in I$, such a $y$ can be found in $M_I$. So we have found the desired witness inside the substructure, and so it is elementary $M_I\prec M$. Note that in the general case, one does not necessarily know that $I$ has a least upper bound in $M$. Under suitable assumptions, it can happen that $I$ is unbounded in $M$, that $I$ is an ordinal of $M$, or that $I$ is bounded in $M$, but has no least upper bound.

What I am interested in for this post is how the definable cut might be affected by forcing. Of course, it is easy to see that if $M$ is definable in $M[G]$, then the definable cut of $M[G]$ is at least as high as the definable cut of $M$, simply because the definable ordinals of $M$ remain definable in $M[G]$.

A second easy observation is that if the definable cut of $M$ is bounded in $M$, then we could perform large collapse forcing, collapsing a cardinal above $I$ to $\omega$, which would of course make every cardinal of $I$ countable in the extension $M[G]$. In this case, since $\omega_1^{M[G]}$ is definable, it would change the definable cut. So this kind of very large forcing can change the definable cut, making it larger.

But what about small forcing? Suppose that the forcing notion $\newcommand\P{\mathbb{P}}\P$ we intend to forcing with is small in the sense that it is in the definable cut $M_I$. This would be true if $\P$ itself were definable, for example, but really we only require that $\P$ has rank less than some definable ordinal of $M$. Can this forcing change the definable cut?

Let me show at least that the definable cut can never go up after small forcing.

Theorem. If $G\subset\P$ is $M$-generic for forcing $\P$ in the definable cut of $M$, then the definable cut of $M[G]$ is below or the same in the ordinals as it was in $M$.

Proof. Suppose that $G\subset\P$ is $M$-generic, and we consider the forcing extension $M[G]$. We have already proved that $M_I\prec M$ is an elementary submodel. I claim that this relation lifts to the forcing extension $M_I[G]\prec M[G]$. Note first that since $\P\in M_I$ and $M_I$ is a rank initial segment of $M$, it follows that $M_I$ has all the subsets of $\P$ in $M$, and so $G$ is $M_I$-generic. So the extension $M_I[G]$ makes sense. Next, suppose that $M[G]\models\varphi(a)$ for some $a\in M_I[G]$. If $\dot a$ is a name for $a$ in $M_I$, then there is some condition $p\in G$ forcing $\varphi(\dot a)$ over $M$. Since $M_I\prec M$, this is also forced by $p$ over $M_I$, and thus $M_I[G]\models\varphi(a)$ as well, as desired. So $M_I[G]\prec M[G]$, and from this it follows that every definable ordinal of $M[G]$ is in the cut $I$. So the definable cut did not get higher. QED

But can it go down? Not if the model $M$ is definable in $M[G]$, by our earlier easy observation. Consequently,

Theorem. If $M$ is definable in $M[G]$, where $G\subset\P$ is $M$-generic for forcing $\P$ below the definable cut of $M$, then the definable cut of $M[G]$ is the same as the definable cut of $M$.

Proof. It didn’t go down, since $M$ is definable in $M[G]$; and it didn’t go up, since $\P$ was small. QED

What if $M$ is not definable in $M[G]$? Can we make the definable cut go down after small forcing? The answer is yes.

Theorem. If ZFC is consistent, then there is a model $M\models\text{ZFC}$ with a definable notion of forcing $\P$ (hence in the definable cut of $M$), such that if $G\subset\P$ is $M$-generic, then the definable cut of the forcing extension $M[G]$ is strictly shorter than the definable cut of $M[G]$.

Proof. Start with a model of $\text{ZFC}+V=L$, whose definable ordinals are bounded by a cardinal $\delta$. Let’s call it $L$, and let $I$ be the definable cut of $L$, which we assume is bounded by $\delta$. Let $M=L[G]$ be the forcing extension of $L$ obtained by performing an Easton product, adding a Cohen subset to every regular cardinal above $\delta$ in $L$. Since this forcing adds no sets below $\delta$, but adds a Cohen set at $\delta^+$, it follows that $\delta$ becomes definable in $L[G]$. In fact, since the forcing is homogeneous and definable from $\delta$, it follows that the definable ordinals of $L[G]$ are precisely the ordinals that are definable in $L$ with parameter $\delta$. These may be bounded or unbounded in $L[G]$. Now, let $\newcommand\Q{\mathbb{Q}}\Q$ be the Easton product forcing at the stages below $\delta$, and suppose that $G\subset\Q$ is $L[G]$-generic. Consider the model $L[G][H]$. Note that the forcing $\Q$ is definable in $L[G]$, since $\delta$ is definable there. This two-step forcing can be combined into one giant Easton product in $L$, the product that simply forces to add a Cohen subset to every regular cardinal. Since this version of the forcing is homogeneous and definable in $L$, it follows that the definable ordinals of $L[G][H]$ are precisely the definable ordinals of $L$, which are bounded by $I$. In summary, the definable cut of $L[G]$ is strictly above $\delta$, since $\delta$ is definable in $L[G]$, and the forcing $\Q$ has size and rank $\delta$; but the forcing extension $L[G][H]$ has definable cut $I$, which is strictly bounded by $\delta$. So the definable cut was made smaller by small forcing, as claimed. QED

This post is an account of some ideas that Alexander Block and I had noted today during the course of our mathematical investigation of another matter.

Games with the computable-play paradox

The_Chess_Game_-_Sofonisba_AnguissolaLet me tell you about a fascinating paradox arising in certain infinitary two-player games of perfect information. The paradox, namely, is that there are games for which our judgement of who has a winning strategy or not depends on whether we insist that the players play according to a deterministic computable procedure. In the the space of computable play for these games, one player has a winning strategy, but in the full space of all legal play, the other player can ensure a win.

The fundamental theorem of finite games, proved in 1913 by Zermelo, is the assertion that in every finite two-player game of perfect information — finite in the sense that every play of the game ends in finitely many moves — one of the players has a winning strategy. This is generalized to the case of open games, games where every win for one of the players occurs at a finite stage, by the Gale-Stewart theorem 1953, which asserts that in every open game, one of the players has a winning strategy. Both of these theorems are easily adapted to the case of games with draws, where the conclusion is that one of the players has a winning strategy or both players have draw-or-better strategies.

Let us consider games with a computable game tree, so that we can compute whether or not a move is legal. Let us say that such a game is computably paradoxical, if our judgement of who has a winning strategy depends on whether we restrict to computable play or not. So for example, perhaps one player has a winning strategy in the space of all legal play, but the other player has a computable strategy defeating all computable strategies of the opponent. Or perhaps one player has a draw-or-better strategy in the space of all play, but the other player has a computable strategy defeating computable play.

Examples of paradoxical games occur in infinite chess. We described such a paradoxical position in my paper Transfinite games values in infinite chess by giving a computable infinite chess position with the property that both players had drawing strategies in the space of all possible legal play, but in the space of computable play, then white had a computable strategy defeating any particular computable strategy for black.

For a related non-chess example, let $T$ be a computable subtree of $2^{<\omega}$ having no computable infinite branch, and consider the game in which black simply climbs in this tree as white watches, with black losing whenever he is trapped in a terminal node, but winning if he should climb infinitely. This game is open for white, since if white wins, this is known at a finite stage of play. In the space of all possible play, Black has a winning strategy, which is simply to climb the tree along an infinite branch, which exists by König’s lemma. But there is no computable strategy to find such a branch, by the assumption on the tree, and so when black plays computably, white will inevitably win.

For another example, suppose that we have a computable linear order $\lhd$ on the natural numbers $\newcommand\N{\mathbb{N}}\N$, which is not a well order, but which has no computable infinite descending sequence. It is a nice exercise in computable model theory to show that such an order exists. If we play the count-down game in this order, with white trying to build a descending sequence and black watching. In the space of all play, white can succeed and therefore has a winning strategy, but since there is no computable descending sequence, white can have no computable winning strategy, and so black will win every computable play.

There are several proofs of open determinacy (and see my MathOverflow post outlining four different proofs of the fundamental theorem of finite games), but one of my favorite proofs of open determinacy uses the concept of transfinite game values, assigning an ordinal to some of the positions in the game tree. Suppose we have an open game between Alice and Bob, where the game is open for Alice. The ordinal values we define for positions in the game tree will measure in a sense the distance Alice is away from winning. Namely, her already-won positions have value $0$, and if it is Alice’s turn to play from a position $p$, then the value of $p$ is $\alpha+1$, if $\alpha$ is minimal such that she can play to a position of value $\alpha$; if it is Bob’s turn to play from $p$, and all the positions to which he can play have value, then the value of $p$ is the supremum of these values. Some positions may be left without value, and we can think of those positions as having value $\infty$, larger than any ordinal. The thing to notice is that if a position has a value, then Alice can always make it go down, and Bob cannot make it go up. So the value-reducing strategy is a winning strategy for Alice, from any position with value, while the value-maintaining strategy is winning for Bob, from any position without a value (maintaining value $\infty$). So the game is determined, depending on whether the initial position has value or not.

What is the computable analogue of the ordinal-game-value analysis in the computably paradoxical games? If a game is open for Alice and she has a computable strategy defeating all computable opposing strategies for Bob, but Bob has a non-computable winning strategy, then it cannot be that we can somehow assign computable ordinals to the positions for Alice and have her play the value-reducing strategy, since if those values were actual ordinals, then this would be a full honest winning strategy, even against non-computable play.

Nevertheless, I claim that the ordinal-game-value analysis does admit a computable analogue, in the following theorem. This came out of a discussion I had recently with Noah Schweber during his recent visit to the CUNY Graduate Center and Russell Miller. Let us define that a computable open game is an open game whose game tree is computable, so that we can tell whether a given move is legal from a given position (this is a bit weaker than being able to compute the entire set of possible moves from a position, even when this is finite). And let us define that an effective ordinal is a computable relation $\lhd$ on $\N$, for which there is no computable infinite descending sequence. Every computable ordinal is also an effective ordinal, but as we mentioned earlier, there are non-well-ordered effective ordinals. Let us call them computable pseudo-ordinals.

Theorem. The following are equivalent for any computable game, open for White.

  1. White has a computable strategy defeating any computable play by Black.
  2. There is an effective game-value assignment for white into an effective ordinal $\lhd$, giving the initial position a value. That is, there is a computable assignment of some positions of the game, including the first position, to values in the field of $\lhd$, such that from any valued position with White to play, she can play so as to reduce value, and with Black to play, he cannot increase the value.

Proof. ($2\to 1$) Given the computable values into an effective ordinal, then the value-reducing strategy for White is a computable strategy. If Black plays computably, then together they compute a descending sequence in the $\lhd$ order. Since there is no computable infinite descending sequence, it must be that the values hit zero and the game ends with a win for White. So White has a computable strategy defeating any computable play by Black.

($1\to 2$) Conversely, suppose that White has a computable strategy $\sigma$ defeating any computable play by Black. Let $\tau$ be the subtree of the game tree arising by insisting that White follow the strategy $\sigma$, and view this as a tree on $\N$, a subtree of $\N^{<\omega}$. Imagine the tree growing downwards, and let $\lhd$ be the Kleene-Brouwer order on this tree, which is the lexical order on incompatible positions, and otherwise longer positions are lower. This is a computable linear order on the tree. Since $\sigma$ is computably winning for White, the open player, it follows that every computable descending sequence in $\tau$ eventually reaches a terminal node. From this, it follows that there is no computable infinite descending sequence with respect to $\lhd$, and so this is an effective ordinal. We may now map every node in $\tau$, which includes the initial node, to itself in the $\lhd$ order. This is a game-value assignment, since on White’s turn, the value goes down, and it doesn’t go up on Black’s turn. QED

Corollary. A computable open game is computably paradoxical if and only if it admits an effective game value assignment for the open player, but only with computable pseudo-ordinals and not with computable ordinals.

Proof. If there is an effective game value assignment for the open player, then the value-reducing strategy arising from that assignment is a computable strategy defeating any computable strategy for the opponent. Conversely, if the game is paradoxical, there can be no such ordinal-assignment where the values are actually well-ordered, or else that strategy would work against all play by the opponent. QED

Let me make a few additional observations about these paradoxical games.

Theorem. In any open game, if the closed player has a strategy defeating all computable opposing strategies, then in fact this is a winning strategy also against non-computable play.

Proof. If the closed player has a strategy $\sigma$ defeating all computable strategies of the opponent, then in fact it defeats all strategies of the opponent, since any winning play by the open player against $\sigma$ wins in finitely many moves, and therefore there is a computable strategy giving rise to the same play. QED

Corollary. If an open game is computably paradoxical, it must be the open player who wins in the space of computable play and the closed player who wins in the space of all play.

Proof. The theorem shows that if the closed player wins in the space of computable play, then that player in fact wins in the space of all play. QED

Corollary. There are no computably paradoxical clopen games.

Proof. If the game is clopen, then both players are closed, but we just argued that any computable strategy for a closed player winning against all computable play is also winning against all play. QED

Buckets of fish!

Let me tell you about the game Buckets of fishReef_shark_beneath_a_school_of_jack_fish 4096

This is a two-player game played with finitely many buckets in a line on the beach, each containing a finite number of fish. There is also a large supply of additional fish available nearby, fresh off the boats.

Taking turns, each player selects a bucket and removes exactly one fish from it and then, if desired, adds any finite number of fish from the nearby supply to the buckets to the left.

For example, if we label the buckets from the left as 1, 2, 3 and so on, then a legal move would be to take one fish from bucket 4 and then add ten fish to bucket 1, no fish to bucket 2, and ninety-four fish to bucket 3. The winner is whoever takes the very last fish from the buckets, leaving them empty.

Since huge numbers of fish can often be added to the buckets during play, thereby prolonging the length of play, a skeptical reader may wonder whether the game will necessarily come to an end. Perhaps the players can prolong the game indefinitely? Or must it always come to an end?

Question. Does every play of the game Buckets of fish necessarily come to an end?

The answer is yes, every game must eventually come to a completion. I shall give several arguments.

Theorem. Every play of the game Buckets of fish ends in finitely many moves. All the fish in the buckets, including all the new fish that may have been added during play, will eventually run out by some finite stage during play.

That is, no matter how the players add fish to the buckets during play, even with an endless supply of fish from the boats, they will eventually run out of fish in the buckets and one of the players will take the last fish.

First proof. We prove the claim by (nested) induction on the number of buckets. If there is only one bucket, then there are no buckets to the left of it, and so there is no possibility in this case to add fish to the game. If the one bucket contains $k$ fish, then the game clearly ends in $k$ moves. Assume by induction that all plays using $n$ buckets end in finitely many moves, and suppose that we have a game situation with $n+1$ buckets, with $k$ fish in bucket $n+1$. We now prove by induction on $k$ that all such games terminate. This argument is therefore an instance of nested induction, since we are currently inside our proof by induction on $n$, in the induction step of that proof, and in order to complete it, we are undertaking a separate full induction on $k$. If $k=0$, then there are no fish in bucket $n+1$, and so the game amounts really to a game with only $n$ buckets, which terminates in finitely many steps by our induction hypothesis on $n$. So, let us assume that all plays with $k$ fish in bucket $n+1$ terminate in finitely many moves. Consider a situation where there are $k+1$ many fish in that bucket. I claim that eventually, one of those fish must be taken, since otherwise all the moves will be only on the first $n$ buckets, and all plays on only $n$ buckets terminate in finitely many moves. So at some point, one of the players will take a fish from bucket $n+1$, possibly adding additional fish to the earlier buckets. But this produces a situation with only $k$ fish in bucket $n+1$, which by our induction assumption on $k$ we know will terminate in finitely many steps. So we have proved that no matter how many fish are in bucket $n+1$, the game will end in finitely many moves, and so the original claim is true for $n+1$ buckets. Thus, the theorem is true for any finite number of buckets. QED

A second proof. Let me now give another proof, following an idea arising in a conversation with Miha Habič. We want to prove that there is no infinitely long play of the game Buckets of fish. Suppose toward contradiction that there is a way for the players to conspire to produce an infinite play, starting from some configuration of some finite number $n$ of buckets, each with finitely many fish in them. Fix the particular infinitely long play. Let $m$ be the right-most bucket from which a fish was taken infinitely often during that infinite course of play. It follows, for example, that $m<n$, since the top bucket can be used only finitely often, as it never gets replenished. Since bucket $m$ starts with only finitely many fish in it, and each time it is replenished, it is replenished with only finitely many fish, it follows that in order to have been used infinitely many times, it must also have been replenished infinitely often. But each time it was replenished, it was because there was some bucket further to the right that had been used. Since there are only finitely many buckets to the right of bucket $m$, it follows that one of them must have been used infinitely often. This contradicts the choice of $m$ as the right-most bucket that was used infinitely often. QED

A third proof. Let me now give a third proof, using ordinals. We shall associate with each Buckets-of-fish position a certain ordinal. With the position $$7\quad 2\quad 5\quad 24,$$ for example, we associate the ordinal $$\omega^3\cdot 24+\omega^2\cdot 5+\omega\cdot 2+7.$$ More generally, the number of fish in each bucket of a position becomes the coefficient of the corresponding power of $\omega$, using higher powers for the buckets further to the right. The key observation to make is that these associated ordinals strictly descend for every move of the game, since one is reducing a higher-power coefficient and increasing only lower-power coefficients. Since there is no infinite descending sequence of ordinals, it follows that there is no infinite play in the game Buckets of fish. This idea also shows that the ordinal game values of positions in this game are bounded above by $\omega^\omega$, and every ordinal less than $\omega^\omega$ is realized by some position. QED

OK, fine, so now we know that the game always ends. But how shall we play? What is the winning strategy? Say you are faced with buckets having fish in the amounts: $$4\quad 5\quad 2\quad 0\quad 7\quad 4$$ What is your winning move? Please give it some thought before reading further.




The winning strategy turns out to be simpler than you might have expected.

Theorem. The winning strategy in the game Buckets of fish is to play so as to ensure that every bucket has an even number of fish.

Proof. Notice first, as a warm-up, that in the case that there is only one bucket containing an even number of fish, then the second player will win, since the first player will necessarily make it odd, and then the second player will make it even again, and so on. So it will be the second player who will make it zero, winning the game. So with one bucket, the player who can make the bucket even will be the winner.

Next, notice that if you play so as to give your opponent an even number of fish in every bucket, then whatever move your opponent makes will result in an odd number of fish in the bucket from which he or she takes a fish (and possibly also an odd number of fish in some of the earlier buckets as well, if they happen to add an odd number of fish to some of them). So if you give your opponent an all-even position, then they cannot give you back an all-even position.

Finally, notice that if you are faced with a position that is not all-even, then you can simply take a fish from the right-most odd bucket, thereby making it even, and add fish if necessary to the earlier buckets so as to make them all even. In this way, you can turn any position that is not all-even into an all-even position in one move.

By following this strategy, a player will ensure that he or she will take the last fish, since the winning move is to make the all-zero position, which is an all-even position, and the opponent cannot produce an all-even position. QED

In the particular position of the game mentioned before the theorem, therefore, the winning move is to take a fish from the bucket with 7 fish and add an odd number of fish to the bucket with 5 fish, thereby producing an all-even position.

Finally, let’s consider a few variations of the game. It is clear that the all-even strategy works in the versions of the game where one is limited to add at most one fish to each of the earlier buckets, and this version of the game is actually playable, since the number of fish does not grow too much. A similar variation arises where one can either or add or remove any number of fish (or just at most one) from any of the earlier buckets, or where one can, say, add either 5 or 6 fish only to each of the earlier buckets. What is important in the argument is simply that one should be able to ensure the all-even nature of the buckets.

For a more interesting variation, consider what I call the Take 3 version of the game, where one can take either one, two or three fish from any bucket and then add any number of fish to the earlier buckets. The game must still eventually end, but what is the winning strategy?

Question. What is your strategy in the Take 3 variation of Buckets of fish?

Please post your answers in the comments, and I’ll post an answer later. One can generalize this to the Take $n$ variation, where on each turn, the player is allowed to take between 1 and $n$ fish from any bucket, and add as many fish as desired to the earlier buckets.

Another puzzling variation is where each player can take any number of fish from a bucket, and then add any number of fish to earlier buckets. Can you find a strategy for this version of the game? Please post in the comments.

All triangles are isosceles

Let me share a mathematical gem with you, the following paradoxical “theorem.”

Theorem. Every triangle is isosceles.

Proof. Consider an arbitrary triangle $\triangle ABC$. Let $Q$ be the intersection of the angle bisector (blue) at $\angle A$ and the perpendicular bisector (green) of $BC$ at midpoint $P$.

Isosceles triangle

Drop perpendiculars from $Q$ to $AB$ at $R$ and to $AC$ at $S$. Because $P$ is the midpoint of $BC$ and $PQ$ is perpendicular, we deduce $BQ\cong CQ$ by the Pythagorean theorem. Since $AQ$ is the angle bisector of $\angle A$, the triangles $AQR$ and $AQS$ are similar, and since they share a hypotenuse, they are congruent. It follows that $AR\cong AS$ and also $QR\cong QS$. Therefore $\triangle BQR$ is congruent to $\triangle CQS$ by the hypotenuse-leg congruence theorem. So $RB\cong SC$. And therefore,
$$AB\cong AR+RB\cong AS+SC\cong AC,$$
and so the triangle is isosceles, as desired. QED

Corollary.  Every triangle is equilateral.

Proof. The previous argument proceeded from an arbitrary vertex of the triangle, and so any pair of adjacent sides in the triangle are congruent. So all three are congruent, and therefore it is equilateral. QED

Perhaps you object to my figure, because depending on the triangle, perhaps the angle bisector of $A$ passes on the other side of the midpoint $P$ of $BC$, which would make the point $Q$ lie outside the triangle, as in the following figure.

Isosceles triangle 2

Nevertheless, essentially the same argument works also in this case. Namely, we again let $Q$ be the intersection of the angle bisector at $\angle A$ with the perpendicular bisector of $BC$ at midpoint $P$, and again drop the perpendiculars from $Q$ to $R$ and $S$. Again, we get $BQ\cong CQ$ by the Pythagorean theorem, using the green triangles. And again, we get $\triangle ARQ\cong\triangle ASQ$ since these are similar triangles with the same hypotenuse. So again, we conclude $\triangle BQR\cong\triangle CQS$ by hypotenuse-leg. So we deduce $AB\cong AR-BR\cong AS-CS\cong AC$, by subtracting rather than adding as before, and so the triangle is isosceles.

Question. What is wrong with these arguments?

Please post your answers in the comments below.

The argument is evidently due to E. A. Maxwell, Fallacies in Mathematics, 1959. I first heard it years ago, when I was in graduate school.  Shortly afterward, my advisor W. Hugh Woodin happened to be a little late to seminar, and so I leaped to the chalkboard and gave this proof, leaving the distinguished audience, including R. Solovay, scratching their heads for a while. Woodin arrived, but Solovay wouldn’t let him start the seminar, since he wanted to resolve the triangle argument. What fun!

There are no regular polygons in the hexagonal lattice, except triangles and hexagons

I should like to follow up my post last week about the impossibility of finding regular polygons in the integer lattice. This time, however, we shall consider the hexagonal and triangular lattices.  It is easy to find lattice points that form the vertices of an equilateral triangle or a regular hexagon.


And similarly, such figures can be found also in the triangular lattice.

hex-tri-gridIndeed, the triangular and hexagonal lattices are each refinements of the other, so they will allow exactly the same shapes arising from lattice points.

But can you find a square? How about an octagon?

Question.  Which regular polygons can be formed by vertices of the hexagonal or triangular lattices?

The answer is that in fact there are no other types of regular polygons to be found! I’d like to prove this by means of some elementary classical reasoning.

Theorem. The only non-degenerate regular polygons that arise from vertices in the hexagonal or triangular lattices are the equilateral triangles and regular hexagons.

This theorem extends the analysis I gave in my post last week for the integer lattice, showing that there are no regular polygons in the integer lattice, except squares.

Proof.  The argument for the hexagonal and triangular lattices uses a similar idea as with the integer lattice, but there is a little issue with the square and pentagon case. We can handle both the hexagonal and triangular lattices at the same time. The crucial fact is that both of these lattices are invariant by $120^\circ$ rotation about any lattice vertex.

To get started, suppose that we can find a square in the hexagonal lattice. square We may rotate this square by $120^\circ$ about the vertex $a$, and the square vertices will all land on lattice-vertex points. Next, we may rotate the resulting square about the point $b$, and again the vertices will land on lattice points. So we have described how to transform any square vertex point $a$ to another lattice point $c$ which is strictly inside the square. By applying that operation to each of the four vertices of the square, we thereby arrive by symmetry at a strictly smaller square. Thus, for any square in the hexagonal or triangular lattice, there is a strictly smaller square. But if there were any square in the lattice at all, there would have to be a square with smallest side-length, since there are only finitely many lattice distances less than any given length. So there can be no square in the hexagonal or triangular lattice.

The same construction works with pentagons. pentagon

If there is a pentagon in the lattice, then we may rotate it about point $a$, and then again about point $b$, resulting in point $c$ strictly inside the pentagon, which leads to an infinite sequence of strictly smaller pentagon, whose sizes (by similarity) go to zero. So there can be no pentagon in the hexagonal or triangular lattices.

If we attempt to apply this argument with triangles or hexagons, then the process simply leads back again to points in the original figure. But of course, since triangles and hexagons do arise in these grids, we didn’t expect the construction to work with them.

triangle hexagon





But also, this particular two-step rotation construction does not succeed with the heptagon (7-sides) or larger polygons, since the resulting point $c$ ends up outside the original heptagon, which means that the new heptagon we construct ends up being larger, rather than smaller than the original.








Fortunately, for the 7-gon and higher, we may fall back on the essential idea used in the square lattice case. Namely, because the interior angles of these polygons are now larger than $120^\circ$, we may simply rotate each side of the polygon by $120^\circ$ and thereby land at a lattice point. In this way, we construct a proportionally smaller instance of the same regular $n$-gon, and so there can be no smallest instance of such a polygon.









In summary, in every case of a regular polygon except the equilateral triangle and the regular hexagon, we found that by means of $120^\circ$ rotations we were able to find a strictly smaller instance of the polygon. Therefore, there can be no instances of such polygons arising from lattice points in the hexagonal or triangular tilings. QED

See my earlier post: there are no regular polygons in the integer lattice, except squares.

There are no nondegenerate regular polygons in the integer lattice, except for squares

Consider a regular square lattice, formed by a grid of intersection points of uniformly spaced parallel horizontal and vertical lines in the plane.  One may easily find points that form the vertices of a square in this lattice.
But can one find points that form the vertices of regular hexagon? Or a regular pentagon? How about an equilateral triangle? And so on.


The answer is that one cannot find these shapes using vertices in the square lattice.

Theorem. There is no regular pentagon in the square lattice, and no regular hexagon, no regular heptagon, and so on. Indeed, the only nondegenerate regular polygons to be found using vertices in a square lattice are squares themselves.

I think this must be a classical result. I was inspired by Vaughn Climenhaga’s beautiful image in his Proof without words answer on MathOverflow, which handled the case of a hexagon. Reflecting upon it, I realized that the same idea works with other regular polygons, and I endeavored to produce the corresponding images, below.

Proof. Suppose that we could find five vertices in the square lattice that formed a regular pentagon.  Construct on each side a perpendicular of the same length, as pictured in brown below. Since the lattice is invariant under $90^\circ$ rotations centered at a lattice point, each of these new points is still a lattice point. And by symmetry, they form the vertices of a (proportionally smaller) regular pentagon. Therefore, there can be no regular pentagon at all in the square lattice, since if there is one, it is clear that there would have to be a smallest instance.   pentagon

An alternative way to argue is: by similarity, the size of the smaller pentagon shrinks by the same factor with each step, and so in the limit the size approaches zero; but clearly, we cannot have a lattice-point regular pentagon whose size is smaller than the square lattice spacing itself. So there is no regular pentagon in the square lattice.

The same argument works with larger regular polygons.  The main point to realize is that for all regular $n$-gons, where $n>4$, when you construct the perpendicular on one of the sides, the resulting point is strictly inside the original polygon, and this is why the resulting regular $n$-gon is strictly smaller than the original. This completes the proof for all $n$-gons for $n>4$.
















The case of an equilateral triangle requires special care. If one attempts the same construction idea as above, building the perpendicular on the edges of a triangle, then the resulting triangle becomes larger, rather than smaller, and so the proof method breaks down.


Nevertheless, one can reduce the equilateral triangle case to a hexagon: if you could find an equilateral triangle in the square lattice, then since the lattice is invariant under translation via a lattice-point line segment, it follows that we could build a regular hexagon. But we have already showed that we cannot find a regular hexagon in the square lattice, and so we cannot find an equilateral triangle.


So we’ve completed the proof for all nondegenerate regular polygons, except the square. QED

For those who might be interested, here is the tex code I used to generate the nested polygons. The code accepts input $n$ to produce a regular $n$-gon with the perpendiculars constructed.




\foreach \k in {0,...,\d} {
\foreach \i in {0,...,\m} {
\coordinate (x) at (360*\i/\n+\f:\R);
\coordinate (y) at (360*\i/\n+360/\n+\f:\R);
% the perp indicator
\ifthenelse{\k=0\AND\i=\m\AND\n<20}{\draw[very thin,black!\c!white] ($(x)!.85!(y)$) node (p) {} --
($(p)!1!90:(y)$) node (q) {} -- ($(q)!1!90:(p)$);}{}
% connecting to lower level
\draw [line width=.5*exp(-\k/2),GonColorC!\c!white] (y) -- ($(y)!1!-90:(x)$);
% edge and vertices of current gon
\draw [line width=1.5*exp(-\k/2),GonColor!\c!white] (x) node
[circle,scale=.3*\R,black!\c!white,fill=black!\c!white] {} -- (y);

\draw (\f:\R) node [circle,scale=.3*\R,black!\c!white,fill=black!\c!white] {};


\foreach\n in {5,6,7,8,9,10,12,16,20} {


See my follow-up post on the regular polygons arising in the hexagonal and triangular latticeshex-grid

A lifetime of hot air

We’ve been making some Fermi estimations in the Math for Liberal Arts class I am teaching, and today we discuss the following question:

Question. If you collect all the hot-air that you have breathed in your life, what would the volume be? If you made a hot-air balloon, would it be able to lift you and all your possessions?

To answer, let’s start with the first part. How much do I breathe? If I imagine inhaling and then exhaling a deep, big breath, I figure I could inflate a small paper bag, perhaps well over one liter, but probably not as much as two liters. But my passive resting breathing is probably much less than a big deep breath. So let’s figure a half liter per ordinary passive breath. How often do I breathe? Well, in the swimming pool, I can hold my breath under water for a minute or even two minutes (in my younger swim-team days); but if I hold my breath right now, I can say that it does start to feel a little unnatural, like I should take my next breath, even after just about five or ten seconds, even though this impulse could be resisted longer. It seems to me that my body wants to take another breath in about that time. If we breathe every five seconds, that would mean 12 breaths per minute, so let’s say ten breaths per minute, which would mean a volume of 5 liters per minute. That makes $5\times 60=300$ liters per hour, or $300*24=7200$ liters per day. In a year, this would be $7200\times 365$, which is less than 7000 times 400, which is 2,800,000 liters per year. Let’s say 2.5 million liters per year of hot air. Times 50 years would make $125$ million liters of hot air in all.

Now, each liter of air fills a cube 10 cm on a side, and one thousand of these fit into a cubic meter. So we’ve got 125,000 cubic meters of hot air. This is the same as a cube 50 meters by 50 meters by 50 meters. That is my hot-air balloon! Filled with a lifetime of hot air. (This is considerably larger, more than one hundred times as large, as a typical recreational hot-air balloon, which I understand are usually under 1000 cubic meters. From this point of view, it would seem likely that it could lift me and all my possessions, although body temperature may be much less than is achieved in those balloons.)

If the air was at my body temperature ($98.6^\circ$ F), then would it be able to lift me and all my possessions? Well, let’s see how much it would lift. Hot air expands in proportion to temperature (from absolute zero). If it is a day like today, about $50^\circ$ degrees F, which is about a 50 degree F difference, and absolute zero is minus 460 F. So this is about a 10 per cent increase in temperature. (In metric: we have body temperature of about 37 degrees, and it is about 10 degrees Celsius today, so a difference of 27 degrees, and absolute zero is minus 270, so about ten per cent increase, as I had said.)

So the heat of the hot air caused it to expand in volume by ten percent. The buoyant force of the hot-air balloon is exactly the weight of this displaced air, by the Archimedean principle. Thus, the lifting force of my hot-air balloon will be equal to the mass of air filling ten percent of the volume we computed. How much does air weigh? I happen to remember from my high school science class that one mole of air at one atmosphere of pressure is about 22 liters (my teacher had a cube of exactly that size sitting on his desk, to help us to visualize it). And I also know that air is mainly nitrogen, which forms the molecule $N_2$, and since nitrogen in the periodic table has an atomic number of 14, the molecule $N_2$ has a mass of 28 grams per mole. So air weighs about 28 grams per 22 liters, which is about 1.3 grams per liter. Each cubic meter is one thousand liters, and so 1.3 kilograms per cubic meter (this is much larger than I had expected—air weighs more than one kilogram per cubic meter!). My hot air in total was 125,000 cubic meters, and we said that because of the temperature difference, the volume expanded by ten percent, or 12,500 cubic meters. This expansion would displace an equal volume of air, which weighs 1.3 kg per cubic meter. Thus, the displaced air weighs $12,500\times 1.3\approx 16,000$ kilograms, or about 16 metric tons. So all my hot air, at body temperature in a giant hot air balloon on a chilly day, would have a lifting force able to lift 16 metric tons.

Would this lift me and all my possessions? Do I own 16 tons of stuff? Well, thankfully, I don’t own a car, which would be a ton or more by itself. But I do own a lot of books, a piano, an oven, a dishwasher, some heavy furniture, paintings, and various other items, as well as a collection of large potted plants on my terrace. It seems likely to me that I could fit most if not all my possessions within 16 tons. To gain a little confidence in this, let’s estimate the mass of my books. My wife and I have about twelve large shelves filled with books, each about 2 meters, and then I have about 3 more such shelves filled with books in my offices at the university. If we count half of the home books as mine, plus my office books, that makes 9 shelves times two meters, for about 20 meters of books. If the books are about 25 cm tall on average, and 15 cm deep, that makes $20\times .25\times .15=.75$ cubic meters of books. Let’s round up to one cubic meter of books. How much does a cubic meter of paper weigh? Well, one ream of copy paper weighs about 2 kilograms, and that is a volume of 8.5 by 11 by 2 inches. One meter is about 33 inches, and so we could fill one cubic meter with a pile of reams of copy paper 3 by 3 by 17, which would be 163 reams, or about 300 kilograms. So not even a half ton of books! So I can definitely lift all my most important possessions within the 16 tons.

Final answer: Yes, if we filled a giant balloon with all the hot-air I have breathed in my life, at body temperature, then it would lift me and all my possessions.

On number sense — Would one day of NYC coffee fill the Statue of Liberty? And other fun questions…

Today I gave a lecture on what I call number sense, using a process of estimation and approximation in order to calculate various unknown quantities, including a few fantastical ones. How much coffee is made per day in New York City? Would it fill up the Statue of Liberty? Approximately how many babies are born in New York City each day? If you made a stack of quarters to reach the distance to the moon, what would the dollar-value be? If you piled those quarters in a heap, would it fit in Central Park? How much does the Empire State Building weigh?

These kinds of back-of-the-envelope calculations, in my view, have at their heart the idea that one can solve a difficult or seemingly impossible problem by breaking it into more manageable pieces. We don’t just pull a final answer out of the air, but rather make simplifying assumptions and informed estimates for related quantities that we are more familiar with or have knowledge about, and then use that information to derive a better estimate for the main quantity. For most of these questions, at the outset we may have little idea what would be a reasonable answer, but by the end, we gain some insight and find ourselves a little closer to the truth.

These kind of calculations are also known as Fermi estimations, in light of Fermi’s remarkable ability to make surprisingly accurate estimations on the basis of little or no hard data. The wikipedia page (thanks to Artie Prendergrast-Smith for mentioning this link in the comments) emphasizes that even in a case where the estimate is significantly off the true value, nevertheless we may still find value in the Fermi calculation, because it focusses our attention to the reasons for the divergence. In discovering which of the assumptions underlying our calculations was wrong, we come to a deeper understanding of the true situation.

In the lecture, I began with some very easy cases. For example, how many seats were in the auditorium? The students estimated that there were approximately 12 seats per row and about 10 rows, so 120 seats in all. How old was one of the students, in seconds? Well, he was 18 years old, and so we could simply multiply each year by 365 days, times 24 hours per day, times 60 minutes per hour, times 60 seconds per minute, to get
$$18\times 365\times 24\times 60\times 60\approx 600,000,000 \text{ seconds}.$$
One student objected about leap days, since 365 should be 365.25 or so. But I pointed out that this difference was not as important as it might seem, since already we had made far larger rounding assumptions. For example, the student was not exactly 18 years old, but 18 years old and some several months; by using 18 years only, we made a bigger difference in the answer than caused by the leap-day issue, which would be a difference of only five days or so over 18 years. For the same reason, we should feel free to round the numbers to make the calculation easier. We are aiming at a ballpark estimate rather than an exact answer.

Let’s now do some more interesting cases.

anthora-cupCoffee in New York. How much coffee is made each day in New York? Would it fill the Statue of Liberty? First, let me say that I really don’t have any definite information about how much coffee is made each day in New York, and I fear that my own coffee-obsessed perspective will lead me to over-estimate the amount, but let’s give it a try. New York City has a population of approximately 10 million people. Some of those people, like myself, drink a large amount of coffee each day, but many of the others do not drink coffee at all. I would think that a sizable percentage of the NYC population does drink coffee, perhaps as much as a third or even half consumes coffee daily. Many of those coffee-drinkers have more than one cup per day, and also surely more coffee is made than consumed. So it seems reasonable to me to estimate that we have approximately one medium cup of coffee per person on average per day in New York. Basically, we’re saying that the heavy coffee drinkers and the made-but-not-sold coffee approximately makes up for those who abstain, making the average about one cup per person. So we are talking about 10 million cups of coffee per day. A medium cup of brewed coffee at Starbucks is I think about 12-16 ounces, a little less than a pint, and so let’s say about 3 cups per liter. This amounts to roughly 3 million liters of coffee.

Would it fill the Statue of Liberty? The statue itself is, I estimate, about twenty stories tall, counting the base, and if each story is 15 ft, or 5 meters, that would mean 100 meters tall, counting the base. But I think that the base is about half the height, so let’s say 50 meters for the actual statue itself. I’ve never been inside the statue, but my students say that it is about 10 meters across inside, a little more at the bottom than near the top. If we approximated it as a rectangular solid, that would give a volume of $10\times 10\times 50$ cubic meters, or 5000 cubic meters. But since the statue tapers as you go up, particularly in the arm holding the torch, it really is more like a cone than a rectangular solid, and so we should divide by three. But let’s divide just by two, because she isn’t quite as tapered as a cone. So the Statue of Liberty has a volume of approximately 2500 cubic meters. One cubic meter can be thought of as a 10 by 10 by 10 array of little 10cm cubes, and each of those is exactly one liter. So a cubic meter is 1000 liters, and therefore the Statue of Liberty has a volume of $2500\times 1000=2.5$ million liters. But since we had 3 million liters of coffee, the answer our calculation arrives at is: Yes, one day’s worth of New York coffee would fill up the Statue of Liberty!

Well, we do not have perfect confidence in our estimates and assumptions — for example, perhaps there are many fewer coffee drinkers in New York than we estimated or perhaps we underestimated the volume of the Statue of Liberty. Since the estimated volumes were of basically similar magnitudes, we aren’t really entitled to say that definitely the coffee would fill up the Statue of Liberty. Rather, what we have come to know is that those two volumes are comparably similar in size; they are in the same ballpark.

Elevator trips. While riding downtown last weekend with my son on the subway, a crowded 4 train, we overheard the group standing next to us talking about elevators. One lady said, “My elevator company serves as many elevator trips in New York City in five days as the population of the entire world,” and the rest of her group, impressed, nodded affirmatively in reply. But my thoughts, upon hearing that, were to make a quick calculation. Suppose all 10 million NYC residents rode an elevator 10 times every day, which is way too high (probably one trip per person per day is more reasonable, since many people live and work in buildings without elevators). Even in this extreme case of ten trips per person per day, it would mean only 100 million trips total per day, or 500 million trips over 5 days. This is much less than the world population, and so no way is that person’s claim true, especially since there are also many elevator companies. I thought of mentioning my calculation to those people on the subway, but decided against it. Walking out of the subway in the East Village, however, I asked my son (14 years old) whether he heard those people talking about elevators, and he replied, “Oh, yes, and when they said that, I calculated it in my head: no way is that true.” He then proceeded to explain his calculation, similar to mine. Yay, Horatio!

The Chicago marathon. In the run-up to the Chicago marathon this year, on a route that would wind through the windy city streets, Newsweek magazine reported, “Chicago Marathon organizers expect 1.7 million fans to line the route this year.” (Thanks to the critical math commentary of Mark Iris for bringing this example to my attention.) The organizers had emphasized the economic impact of these spectators, many of whom would presumably be eating in Chicago restaurants and staying at Chicago hotels. But is this a reasonable number?

Let’s calculate. A marathon is approximately 26 miles, and the route has two sides for spectators, so let’s round it to 50 miles of spectator viewing spots. Each mile is about 1800 yards, so we have $50\times 1800=90000$ yards of viewing spots. Each spectator, standing shoulder-to-shoulder, with all their stuff, takes up about a yard of space. So if the marathon was lined on both sides for the entire route with spectators standing shoulder-to-shoulder, this would amount to about 90,000 spectators. In order to have 1.7 million spectators, therefore, they would have to be lined up behind each other. But even if the spectators were 10 people deep on each side for the entire route, which is a vast crowd, it would still amount only to 900,000 people. We would have basically to double this to get to 1.7 million. So, if the live event really had 1.7 million spectators lining the route, then it would mean that the race was lined 20 people deep on each side for the entire route. No way is this number correct! I have never had the chance to attend the Chicago marathon, but at the New York marathon, which I assume is comparable, I know that while there are thick crowds at the finish line in Central Park and at some of the other prominent or especially interesting race locations, most of the rest of the route is much thinner, and at the typical nothing-special location, one could expect easily to have a front-row spot.

College of Staten IslandHow many bricks on the college campus? Our campus, covered with some lovely woods and green meadows, has a number of brick Georgian buildings. Most of these are the same standard size as the mathematics department, but there are also some larger buildings such as the library, the performing arts center, the campus center and the gymnasium. At my lecture, the students and I agreed that altogether it amounted to about 30 buildings of the size of the math building. This building is approximately 30 meters by 70 meters, which would make a perimeter of 200 meters, but because the building has wings and is not purely rectangular, let’s add another 100 meters for the folds in the walls, so about 300 meters around the base of the building. The building is two stories tall, about 10 meters tall, making the area of the walls about 3000 square meters. Of course, there are windows and doors that cut out of these walls, but let’s say that they are approximately accounted for by the extra bricks used in the trimming flourishes at the corners and top of the building. So we have 3000 square meters of brick. Each brick is approximately 10 cm by 30 cm, and so one square meter of brick would have about ten rows of a little more than three bricks across, or about 20 bricks. So each building has about $3000\times 20=60,000$ bricks. And with 30 buildings, this amounts to $30\times 60,000=180,000$ bricks in the buildings on campus. There are also some bricks in the central fountain, a few small brick walls and some bricks lining some of the walkways. So let’s add another ten percent for those extra bricks, arriving at about 200,000 bricks on campus in all.

Positive test result for a rare disease. Suppose that as part of your check-up, your doctor randomly administers a clinical test for a certain rare medical condition. The test is 99 percent accurate, in terms of false positives and false negatives, in the sense that 99 percent of people taking the test have an accurate result with the test. Suppose also that the disease itself is rare, held by only one in a million people. If your test comes back positive, what is the likelihood that you actually have the disease?

This is a subtle question. It might seem to make sense initially that there is a 99 percent chance that you have the disease, since that is the accuracy of the test. But this isn’t actually right, because it doesn’t account for the fact that the disease itself is extremely rare, and so the total number of false positives will actually far outweigh the true positive results. For example, suppose that one million people take the test. About one of these people actually has the disease, and that person is 99 percent likely to test positive. So we expect about one true positive result. And for the others, who don’t have the disease, 99 percent of them will test negative. In other words, about 990,000 people will test negative. The remaining 10,000 people, however, one percent of the total, will have false positive results! So you are in this group of 10,000 people who tested positive, with only one of them actually having the disease. So the odds that you actually have the disease are only about one in ten thousand.

Quarters to the moon. How many quarters would you have to stack to reach the moon? How much would it be worth? how much would it weigh? More than the earth? More than the Empire state building? If you put all those quarters into a pile, would it fit in Central Park?

Well, OK, the question is a little absurd, and there are all kinds of problematic issues: we couldn’t ever actually make such a tower of quarters, as it would topple over; it doesn’t make sense to build a tower to the moon, since the moon is in orbit around the earth and it is moving; the quarters would begin to orbit the earth themselves, or fall back down to the earth or to the moon. But let’s just try to ignore all those problematic issues, and just try to answer how many quarters it would take to make a pile that high.

How far is the moon? I don’t really know, and we could look it up to see what the astronomers say about it, but that would go against the back-of-the-envelope spirit of these problems. So I am just going to use two facts that I picked up years ago: first, I know that the speed of light is about 300,000 km per second; and second, I happen to know that it takes radio signals traveling at the speed of light about one second to get to the moon (eight minutes to the sun), and so the moon is about one light-second away. I remember this fact from having learned it in my childhood, because it was important in a science fiction story I had read, in which radio communication from earth to the moon base had a two-second delay: you send a message, it takes a second to get there, and then a second for the answer to come back. So the moon is one light second away, and light travels 300,000 km in one second. So the moon is about 300,000 kilometers distant (actual value: 387400 km).

Let’s now stack up the quarters. By eyeballing a quarter I had in my pocket, I’d say each quarter is about 2 mm thick, which means 50 quarters per meter, or 50,000 quarters per km. So altogether, we have $300,000\times50,000=15,000,000,000$ many quarters in the stack. Fifteen billion quarters to the moon! It takes four quarters to make a dollar, so that is worth about $4 billion dollars, which may be much less than you would have expected initially.

Let’s put those quarters in a big pile, packed as tightly as possible. Each quarter is a little over 2 cm in diameter, and so the volume of the rectangular solid bounding the quarter is a little more than 2 cm by 2 cm by 2 mm, or about 800 cubic mm, which we can round up to 1 cubic centimeter. That makes sense, because we can imagine folding a quarter up into a little 1 cm cube. So 1000 quarters takes up about one liter. Thus, our quarters-to-the-moon have a volume of about 15 million liters. There are 1000 liters in a cubic meter, and so this is 15,000 cubic meters. Since a cube 10 meters on a side, has a volume of 1000 cubic meters, we can fit all those quarters into fifteen such cubes. I can easily imagine an art instalation, with fifteen such 10 meter cubes placed in Sheep Meadow in Central Park. They would definitely fit.

How much would the quarters weigh? Some of the students in my lecture had worked as cashiers, and so they were familiar with quarter rolls from the bank, which fit in your palm and have ten dollars worth of quarters, or forty quarters. I can imagine a quarter roll in one hand balancing the weight of a half-pound of gouda cheese at the deli in my other hand. So 40 quarters is .5 pound, which makes 100 quarters about 2.2 pounds, which is about 1 kg. We had fifteen billion quarters, which is 150 million times 100 quarters, so 150 million kilograms.

Since we already imagined the quarters in Sheep Meadow in those fifteen large boxes, clearly they weigh far less than the earth and also much less than the Empire State Building.

How much does the Empire State Building weigh? After posing this question, I found out that it is also evidently a famous Google interview question, and you can easily find other blog posts about it. But let me tell you how I thought about it. The Empire State Building is about 100 stories tall, and if we assume 12-15 feet per floor, or about 4 meters, this makes the total height (without tower) about 400 meters. The CUNY Graduate Center where I work is across the corner on Fifth Avenue and 34th street, and I have walked past the Empire State Building many times. I estimate that the base is about 75 meters on Fifth Avenue by 150 meters on 34th Street. But because of the step-backs, the middle part of the tower is considerably less than that, probably 30 by 80 near the top. Let’s say the average cross section of the building is 50 by 100 meters. Now, how much does that weigh? Of course, there is a lot of steel and masonary in the floors and walls, and concrete floors, and also all the contents of the building, with desks and paper files and books and interior walls and doors. Those things are heavy. I really have no idea how much those things weigh, but let me imagine that we build a virtual copy of a complete floor from the Empire State Building, without any of those floor beams or concrete or desks or walls or anything, and instead we flood that virtual room with water, until it has the same weight. Of course, the metal and concrete and masonary is much heavier than water, but the actual space is mostly empty air. How much water would it take to have the same weight? Well, let me just guess that we’d have to flood the virtual room about one-quarter deep with water to have the same weight as the actual materials. At 50 by 100 meters by 4 meters tall for each floor, this would mean a pool of water 50 by 100 meters, one meter deep, for a volume of 5000 cubic meters of water. We already said that one cubic meter of water is 1000 liters, each of which weighs one kilogram, so we are talking about $5000\times 1000$ kilograms or 5 million kilograms per floor. With 100 floors, this is 500 million kilograms, or 500,000 metric tons. So according to this estimate, the Empire State Building with all its contents weighs about 500,000 metric tons.

So it weighs more than the quarters to the moon (150 million kilograms), but not as much more as I would have thought based on the art exhibition in Sheep Meadow! The Empire state building weighs about three or four times as much as the quarters to the moon.

How many babies are born each day in New York City? The population of New York is approximately ten million people. And those people live about 70 years. Let’s imagine spreading those births out uniformly over the 70 year period. That means one million people born in 7 years. This is about 150,000 people born in one year, which is about 3000 births per week, or about 400 births per day. This estimate would be accurate if the city population were in a steady state, with births balancing deaths, but of course the population is increasing. On the other hand, most of that increase is from people moving here, not just being born here. But then again, the people moving to New York I expect are mainly young adults looking for a career, and so they will contribute to a heightened birth rate as they have children. Shall we add 25 percent more to account for the fact that the city is growing and not in a steady state? In that case, our estimate is that there are about 500 births each day in New York City.

There is an endless supply of such questions that can be calculated. I have been talking about them in the lectures for my course Math for Liberal Arts, an undergraduate course aimed at non-math students at my college. In this course, we are fairly free to cover some imaginative topics, and I’ve covered some game theory, some elementary graph theory, a little bit logic, and now this number sense topic, which I use to try to develop the students ability to attack a technical problem by breaking it down into more managable problems. But meanwhile, I also look upon it just as plain fun.

So here are a few more questions that you might enjoy thinking about. And please make your own.

  • What is the total volume of air that you have breathed in your lifetime? If you filled a balloon with all your hot air, how big would it be?
  • If you used that balloon as a hot-air balloon (with the hot air at your body temperature), would it have enough bouyency to lift you and all your possessions? See my post A lifetime of hot air for a detailed answer.
  • How many NYC metro cards exist? (Each lasts about a year or two; they get thrown out, but still exist in the landfill; the system has used metro cards for about twenty years; there is a large stock of new not-yet-used cards.)
  • How many doorknobs are there in your building?
  • How many subway tiles are there in the NYC subway system?
  • How many riders were on my crowded 4 train downtown this morning?
  • How many lightswitches are there on your university campus?
  • How much water does NYC use each day?
  • What are the odds that you drank a molecule of water that was once also drank by Julius Caesar?
  • How many people have there been? What fraction are currently alive?

Please try to figure them out, and post your solutions in the comments below!

Are all Gödel sentences equivalent?

By fdecomiteAnetode at en.wikipedia (Tunnels of Time Transferred from en.wikipedia) [CC BY 2.0 (], from Wikimedia CommonsI should like to consider a family of natural questions concerning the Gödel sentence — the sentence asserting its own non-provability, used by Gödel to prove the incompleteness theorem — and specifically the question whether we are really entitled to speak of the Gödel sentence. Is it unique? Is it unique up to equivalence? Up to provable equivalence? Could there be more than one Gödel sentence, perhaps for different manners of arithmetizing the syntax? Are they all provably equivalent? These questions have come up a number of times in various contexts, and since James Propp just sent me an email inquiry about it yesterday, let me address the questions here in reply (he is planning an upcoming post in a few weeks about incompleteness and self-reference — check back later for the link). I shall make use only of elementary classical incompleteness ideas, and I assume this has all been known for some time.

For definiteness, let us take first-order Peano Arithmetic ($\newcommand\PA{\text{PA}}\PA$) as the base theory over which we are considering provability, although similar observations can be made concerning other base theories. By formalizing the syntax in arithmetic using Gödel coding, we may easily produce an arithmetically expressible provability predicate $\newcommand\Prov{\text{Prov}}\Prov(x)$, which asserts that $x$ is the Gödel code of a $\PA$-provable sentence. By using this predicate and the fixed-point lemma, we may construct a sentence $\psi$ that asserts its own non-provability, meaning precisely: $\PA\vdash\psi\leftrightarrow\neg\Prov(\ulcorner\psi\urcorner)$. Such a sentence $\psi$ is known as the Gödel sentence, and we may use it to prove Gödel’s first incompleteness theorem as follows. Namely, it is easy to see that $\psi$ cannot be provable in $\PA$, for if it were, then in virtue of what $\psi$ asserts, we will have also proved that $\psi$ is not provable, contrary to fact. So $\psi$ is not provable, and therefore we see that indeed $\psi$ is actually true. So the sentence $\psi$ is true, yet unprovable. By paying a little closer attention to the argument, what we have actually argued is that $\PA$ itself proves that if $\newcommand\Con{\text{Con}}\Con(\PA)$, then $\psi$.

Of course, there are many fixed points, and for example any sentence that is provably equivalent to $\psi$, such as $\psi\wedge\psi$, is also provably equivalent to its own non-provability, and therefore serves as yet another Gödel sentence. In this trivial sense, there are infinitely many different Gödel sentences. But these particular sentences, by construction, are provably equivalent. Are all the Gödel sentences equivalent?

Indeed, a bit more generally, suppose that we have two implementations of the provability predicate, using perhaps different formalizations of the syntax, with $\varphi$ and $\psi$ being fixed points of non-provability-in-$\PA$, so that $\varphi$ and $\psi$ each assert their own non-provability with respect to those predicates. Must $\PA$ prove that $\varphi$ and $\psi$ are equivalent?

The first, main answer is that yes, indeed, if we have used a natural provability predicate, then all the Gödel sentences are provably equivalent, and this remains true even for different natural manners of formalizing the syntax. In this sense, therefore, there really is only one Gödel sentence and we are entitled to speak of the Gödel sentence. The reason is that every such Gödel sentence is provably equivalent to the assertion $\Con(\PA)$, and for natural formalizations of the syntax, meaning implementations where we can provably translate proofs from one system to another and the meta-theory, these assertions are all equivalent. To see this, suppose that $\psi$ is a Gödel sentence, which means that $\psi$ asserts its own non-provability. Since $\psi$ implies that there is a non-provable sentence, namely $\psi$ itself, it follows immediately that $\psi$ implies $\Con(\PA)$; conversely, if $\Con(\PA)$ holds, then the argument of the first incompleteness theorem that I mentioned above shows that $\psi$ is true. So $\PA\vdash\Con(\PA)\leftrightarrow\psi$. So all the fixed points are equivalent to the assertion $\Con(\PA)$. For systems where we can provably translate proofs from one system to another, the corresponding consistency assertions $\Con(\PA)$ are equivalent. And so it is fine to speak of the Gödel sentence.

But let me now discuss what happens if we should implement a more exotic form of the provability predicate. For example, consider the Rosser provability predicate, where we say that a sentence $\sigma$ is Rosser provable, if there is a proof of $\sigma$, but no shorter proof of $\neg\sigma$, meaning no proof of $\neg\sigma$ with a smaller Gödel code. The Rosser sentence $\rho$ is a non-provability fixed point of this notion, saying, “I am not Rosser provable.” Since Rosser provability is intuitively a stronger notion of provability, it would be reasonable to expect that the Rosser non-provability assertion is weaker than the Gödel sentence, and indeed that is the case. If the base theory is consistent, then the Rosser sentence cannot be provable, since if it were, then there would have to be a smaller proof of $\neg\rho$, violating consistency; but also, $\neg\rho$ cannot be provable, since this sentence implies immediately that $\rho$ is also provable with a shorter proof, in light of what $\rho$ asserts. So $\PA$ proves that $\Con(\PA)$ implies both $\Con(\PA+\rho)$ and $\Con(\PA+\neg\rho)$, and moreover, $\PA\vdash\Con(\PA)\to\rho$. But since $\PA\vdash\Con(\PA)\to\Con(\PA+\rho)$, it follows that $\PA$, if consistent, cannot prove the converse, $\rho\to\Con(\PA)$, since otherwise $\PA+\rho$ would prove its own consistency, in violation of the second incompleteness theorem. So the Rosser sentence $\rho$ is strictly weaker than $\Con(\PA)$ over $\PA$, and hence also strictly weaker than the usual Gödel sentence. But since the Rosser sentence $\rho$ is itself a fixed point of non-provability with respect to the Rosser formalization of provability, it shows that exotic formalizations of the provability predicate can indeed give rise to inequivalent fixed-point assertions.

Note that the Rosser provability predicate does not sustain provable translations to any of the usual (natural) formalizations of provability, because we cannot prove in $\PA$ as a general statement that whenever a statement is provable, then it is Rosser provable. We can, however, generally translate specific proofs to Rosser proofs, and to the meta-theory, assuming that the base theory is consistent.

Lastly, let’s consider a somewhat more exotic provability predicate, arising from the Feferman-style enumerations of $\PA$, which are arithmetically definable but not computably axiomatizable. Specifically, consider the axiomatization of $\PA$, where we continue to add the usual $\PA$ axioms, but only so long as the resulting theory remains consistent. This way of describing the theory gives rise to a corresponding provability predicate, which expresses provability in that enumerated theory. So sentence $\theta$ is Feferman provable, if it is provable using axioms that come from a consistent initial segment of the $\PA$ axiomatization. If $\phi$ is a Feferman-non-provability fixed point, then $\PA$ proves that $\phi$ is equivalent to the assertion that $\phi$ is not Feferman provable. Since $\PA$ easily proves that the Feferman theory is consistent, and also that any finite collection of $\PA$ axioms are part of the Feferman theory, it follows that the negation of any particular theorem of $\PA$ is a Feferman-non-provability fixed point, since $\PA$ proves that it is false and also that it is not provable in the Feferman theory. But those statements are not equivalent to the usual Gödel sentence, since they are inconsistent with $\PA$.

My very first lemma, which also happened to involve a philosophical dispute

Doppelverh-zentralproj.svgLet me recall the time of my very first lemma, which also happened to involve a philosophical dispute.

It was about 35 years ago; I was a kid in ninth grade at McKinley Junior High School, taking a class in geometry, taught by a charismatic math teacher. We were learning how to do proofs, which in that class always consisted of a numbered list of geometrical assertions, with a specific reason given for each assertion, either stating that it was “given” or that it followed from previous assertions by a theorem that we had come to know. Only certain types of reasons were allowed.

My instructor habitually used the overhead projector, writing on a kind of infinite scroll of transparency film, which he could wind up on one end of the projector, so as never to run out of room. During the semester, he had filled enough spools, it seemed, to fill the library of Alexandria.

One day, it came to be my turn to present to the rest of the class my proof of a certain geometrical theorem I had been assigned. I took the black marker and drew out my diagram and theorem statement. In my proof, I had found it convenient to first establish a certain critical fact, that two particular line segments in my diagram were congruent $\vec{PQ}\cong\vec{RS}$. In order to do so, I had added various construction lines to my diagram and reasoned with side-angle-side and so on.

Having established the congruency, I had then wanted to continue with my proof of the theorem. Since the previous construction lines were cluttering up my diagram, however, I simply erased them, leaving only my original diagram.

The class erupted with objection!  How could I sensibly continue now with my proof, claiming that $\vec{PQ}\cong\vec{RS}$, after I had erased the construction lines? After all, are those lines segments still congruent once we erase the construction lines that provided the reason we first knew this to be true? Many of the students believed that my having erased the construction lines invalidated my proof.

So there I was, in a ninth-grade math class, making a philosophical argument to my fellow students that the truth of the congruence $\vec{PQ}\cong\vec{RS}$ was independent of my having drawn the construction lines, and that we could rely on the truth of that fact later on in my proof, even if I were to erase those construction lines.

After coming to an uneasy, tentative resolution of this philosophical dispute, I was then allowed to continue with the rest of my proof, establishing the main theorem.

I realized only much later that this had been my very first lemma, since I had isolated a mathematically central fact about a certain situation, proving it with a separate argument, and then I had used that fact in the course of proving a more general theorem.