I was fascinated recently to discover something I hadn’t realized about relative interpretability in set theory, and I’d like to share it here. Namely,

Different set theories extending ZF are never bi-interpretable!

For example, ZF and ZFC are not bi-interpretable, and neither are ZFC and ZFC+CH, nor ZFC and ZFC+$\neg$CH, despite the fact that all these theories are equiconsistent. The basic fact is that there are no nontrivial instances of bi-interpretation amongst the models of ZF set theory. This is surprising, and could even be seen as shocking, in light of the philosophical remarks one sometimes hears asserted in the philosophy of set theory that what is going on with the various set-theoretic translations from large cardinals to determinacy to inner model theory, to mention a central example, is that we can interpret between these theories and consequently it doesn’t much matter which context is taken as fundamental, since we can translate from one context to another without loss.

The bi-interpretation result shows that these interpretations do not and cannot rise to the level of bi-interpretations of theories — the most robust form of mutual relative interpretability — and consequently, the translations inevitably must involve a loss of information.

To be sure, set theorists classify the various set-theoretic principles and theories into a hierarchy, often organized by consistency strength or by other notions of interpretative power, using forcing or definable inner models. From any model of ZF, for example, we can construct a model of ZFC, and from any model of ZFC, we can construct models of ZFC+CH or ZFC+$\neg$CH and so on. From models with sufficient large cardinals we can construct models with determinacy or inner-model-theoretic fine structure and vice versa. And while we have relative consistency results and equiconsistencies and even mutual interpretations, we will have no nontrivial bi-interpretations.

(I had proved the theorem a few weeks ago in joint work with Alfredo Roque Freire, who is visiting me in New York this year. We subsequently learned, however, that this was a rediscovery of results that have evidently been proved independently by various authors. Albert Visser proves the case of PA in his paper, “Categories of theories and interpretations,” Logic in Tehran, 284–341, Lect. Notes Log., 26, Assoc. Symbol. Logic, La Jolla, CA, 2006, (pdf, see pp. 52-55). Ali Enayat gave a nice model-theoretic argument for showing specifically that ZF and ZFC are not bi-interpretable, using the fact that ZFC models can have no involutions in their automorphism groups, but ZF models can; and he proved the general version of the theorem, for ZF, second-order arithmetic $Z_2$ and second-order set theory KM in his 2016 article, A. Enayat, “Variations on a Visserian theme,” in Liber Amicorum Alberti : a tribute to Albert Visser / Jan van Eijck, Rosalie Iemhoff and Joost J. Joosten (eds.) Pages, 99-110. ISBN, 978-1848902046. College Publications, London. The ZF version was apparently also observed independently by Harvey Friedman, Visser and Fedor Pakhomov.)

Meanwhile, let me explain our argument. Recall from model theory that one theory $S$ is *interpreted* in another theory $T$, if in any model of the latter theory $M\models T$, we can define (and uniformly so in any such model) a certain domain $N\subset M^k$ and relations and functions on that domain so as to make $N$ a model of $S$. For example, the theory of algebraically closed fields of characteristic zero is interpreted in the theory of real-closed fields, since in any real-closed field $R$, we can consider pairs $(a,b)$, thinking of them as $a+bi$, and define addition and multiplication on those pairs in such a way so as to construct an algebraically closed field of characteristic zero.

Two theories are thus *mutually interpretable*, if each of them is interpretable in the other. Such theories are necessarily equiconsistent, since from any model of one of them we can produce a model of the other.

Note that mutual interpretability, however, does not insist that the two translations are inverse to each other, even up to isomorphism. One can start with a model of the first theory $M\models T$ and define the interpreted model $N\models S$ of the second theory, which has a subsequent model of the first theory again $\bar M\models T$ inside it. But the definition does not insist on any particular connection between $M$ and $\bar M$, and these models need not be isomorphic nor even elementarily equivalent in general.

By addressing this, one arrives at a stronger and more robust form of mutual interpretability. Namely, two theories $S$ and $T$ are *bi-interpretable*, if they are mutually interpretable in such a way that the models can see that the interpretations are inverse. That is, for any model $M$ of the theory $T$, if one defines the interpreted model $N\models S$ inside it, and then defines the interpreted model $\bar M$ of $T$ inside $N$, then $M$ is isomorphic to $\bar M$ by a definable isomorphism in $M$, and uniformly so (and the same with the theories in the other direction). Thus, every model of one of the theories can see exactly how it itself arises definably in the interpreted model of the other theory.

For example, the theory of linear orders $\leq$ is bi-interpretable with the theory of strict linear order $<$, since from any linear order $\leq$ we can define the corresponding strict linear order $<$ on the same domain, and from any strict linear order $<$ we can define the corresponding linear order $\leq$, and doing it twice brings us back again to the same order.

For a richer example, the theory PA is bi-interpretable with the finite set theory $\text{ZF}^{\neg\infty}$, where one drops the infinity axiom from ZF and replaces it with the negation of infinity, and where one has the $\in$-induction scheme in place of the foundation axiom. The interpretation is via the Ackerman encoding of hereditary finite sets in arithmetic, so that $n\mathrel{E} m$ just in case the $n^{th}$ binary digit of $m$ is $1$. If one starts with the standard model $\mathbb{N}$, then the resulting structure $\langle\mathbb{N},E\rangle$ is isomorphic to the set $\langle\text{HF},\in\rangle$ of hereditarily finite sets. More generally, by carrying out the Ackermann encoding in any model of PA, one thereby defines a model of $\text{ZF}^{\neg\infty}$, whose natural numbers are isomorphic to the original model of PA, and these translations make a bi-interpretation.

We are now ready to prove that this bi-interpretation situation does not occur with different set theories extending ZF.

**Theorem.** Distinct set theories extending ZF are never bi-interpretable. Indeed, there is not a single model-theoretic instance of bi-interpretation occurring with models of different set theories extending ZF.

**Proof.** I mean “distinct” here in the sense that the two theories are not logically equivalent; they do not have all the same theorems. Suppose that we have a bi-interpretation instance of the theories $S$ and $T$ extending ZF. That is, suppose we have a model $\langle M,\in\rangle\models T$ of the one theory, and inside $M$, we can define an interpreted model of the other theory $\langle N,\in^N\rangle\models S$, so the domain of $N$ is a definable class in $M$ and the membership relation $\in^N$ is a definable relation on that class in $M$; and furthermore, inside $\langle N,\in^N\rangle$, we have a definable structure $\langle\bar M,\in^{\bar M}\rangle$ which is a model of $T$ again and isomorphic to $\langle M,\in^M\rangle$ by an isomorphism that is definable in $\langle M,\in^M\rangle$. So $M$ can define the map $a\mapsto \bar a$ that forms an isomorphism of $\langle M,\in^M\rangle$ with $\langle \bar M,\in^{\bar M}\rangle$. Our argument will work whether we allow parameters in any of these definitions or not.

I claim that $N$ must think the ordinals of $\bar M$ are well-founded, for otherwise it would have some bounded cut $A$ in the ordinals of $\bar M$ with no least upper bound, and this set $A$ when pulled back pointwise by the isomorphism of $M$ with $\bar M$ would mean that $M$ has a cut in its own ordinals with no least upper bound; but this cannot happen in ZF.

If the ordinals of $N$ and $\bar M$ are isomorphic in $N$, then all three models have isomorphic ordinals in $M$, and in this case, $\langle M,\in^M\rangle$ thinks that $\langle N,\in^N\rangle$ is a well-founded extensional relation of rank $\text{Ord}$. Such a relation must be set-like (since there can be no least instance where the predecessors form a proper class), and so $M$ can perform the Mostowski collapse of $\in^N$, thereby realizing $N$ as a transitive class $N\subseteq M$ with $\in^N=\in^M\upharpoonright N$. Similarly, by collapsing we may assume $\bar M\subseteq N$ and $\in^{\bar M}=\in^M\upharpoonright\bar M$. So the situation consists of inner models $\bar M\subseteq N\subseteq M$ and $\langle \bar M,\in^M\rangle$ is isomorphic to $\langle M,\in^M\rangle$ in $M$. This is impossible unless all three models are identical, since a simple $\in^M$-induction shows that $\pi(y)=y$ for all $y$, because if this is true for the elements of $y$, then $\pi(y)=\{\pi(x)\mid x\in y\}=\{x\mid x\in y\}=y$. So $\bar M=N=M$ and so $N$ and $M$ satisfy the same theory, contrary to assumption.

If the ordinals of $\bar M$ are isomorphic to a proper initial segment of the ordinals of $N$, then a similar Mostowski collapse argument would show that $\langle\bar M,\in^{\bar M}\rangle$ is isomorphic in $N$ to a transitive set in $N$. Since this structure in $N$ would have a truth predicate in $N$, we would be able to pull this back via the isomorphism to define (from parameters) a truth predicate for $M$ in $M$, contrary to Tarski’s theorem on the non-definability of truth.

The remaining case occurs when the ordinals of $N$ are isomorphic in $N$ to an initial segment of the ordinals of $\bar M$. But this would mean that from the perspective of $M$, the model $\langle N,\in^N\rangle$ has some ordinal rank height, which would mean by the Mostowski collapse argument that $M$ thinks $\langle N,\in^N\rangle$ is isomorphic to a transitive set. But this contradicts the fact that $M$ has an injection of $M$ into $N$. $\Box$

It follows that although ZF and ZFC are equiconsistent, they are not bi-interpretable. Similarly, ZFC and ZFC+CH and ZFC+$\neg$CH are equiconsistent, but no pair of them is bi-interpretable. And again with all the various equiconsistency results concerning large cardinals.

A similar argument works with PA to show that different extensions of PA are never bi-interpretable.

Hi Joel, I just saw your post, and wanted to point out that I wrote a paper a couple of years ago that establishes the same result (and some others). I found out that Harvey Friedman and Albert Visser has also discovered the same result (for ZF), and after writing my paper Fedor Pakhomov from Moscow also informed me that he had also discovered it independently.

My paper can be found on the arXiv through https://arxiv.org/abs/1702.07093, and it was also published with the following coordinates:

A. Enayat, “Variations on a Visserian theme”, in Liber Amicorum Alberti : a tribute to Albert Visser / Jan van Eijck, Rosalie Iemhoff and Joost J. Joosten (eds.) Pages, 99-110. ISBN, 978-1848902046. Publisher, College Publications. Place of publication, London. Publication year, 2016.

Thanks very much for this reference, which I shall add to the main post. It’s clearly past time for the result to become common knowledge!

Joel, what’s also fascinating about the result (both for PA, ZF, and their higher order analogues such as second order number theory and Kelley-Morse theory of classes) is that the full power of the theory seems to be necessary in the proof, which makes one wonder if there is some sort of reversal at work for each case.

For example, is it true that if T is some weak set theory (e.g. KP or even weaker) such that any two distinct consistent extensions of T are bi-interpretable, then T must already be an extension of ZF? If the answer is positive, then we would have a natural structural characterization of ZF. I have posed a more general form of this question in the last section of my paper mentioned in my earlier comment.

That is a very interesting question.

Two more comments: the sentence in your current text that says “We subsequently learned, however, that this was a rediscovery of results of Albert Visser” is correct for T = PA (the result at the end of the blog); but for T = ZF, the result was established independently by me (and published), and by the unpublished joint work of Visser and Friedman, and by Pakhomov.

In my published work I also extended the result to higher order analogues of PA and ZF (in particular to second order number theory and Kelley-Morse theory of classes).

OK, I’ll change this phrasing.

Hi Joel, sorry for being pedantic, but the ZF-result does not follow at all from any of the results in Visser’s paper, his paper only establishes the PA case and does not even implicitly establish the ZF case.

OK, I’m sorry for the confusion. I’ll look into this and get the credit straight.

Thanks Joel.

Let me also add (for the sake of posterity) that in my previous comment when I wrote “his paper only establishes the PA case” (in connection with Visser’s paper), the “only” refers to the intersection of the content of your blog and Visser’s paper, and that Visser’s paper houses many results and suggestions for further research prompted by a category-theoretic view of interpretations.

Hi Joel,

it was nice seeing you the other day. and this is a nice result!

having said this, however, let me just say that it doesn’t really have much to do with what you say is shocking. in fact i am not sure why it is shocking for the reasons you outlined. it could be shocking for some other reasons, but to me, i don’t see the connection.

for one thing, when one speaks of bi-interpretability one usually allows forcing extensions…

but more importantly, i really do not think the formal version of the bi-interpretability claim is of any mathematical significance whatsoever. bi-interpretability claim, to me at least, consists of set of problems that, if solved, would imply deep connection between various seemingly different areas of set theory.

for instance, MM(c) can be forced over models of AD_R+Theta is regular (Woodin). MM implies that there are canonical models of AD_R+Theta is regular (Trang) and of LSA (myself and Trang).

the real problem here is whether one can show that MM implies Woodin’s (*), or whether MM can hold in a generic extension of some model of determinacy. these will be true pillars of pure set theory, i think.

as far as philosophy behind this stuff goes, i think that it is the least interesting aspect of all of it. but if one cares, let me just say that to me, math is more like science, i am doing math like a scientist would do science, i draw pictures, i do examples, i find relations between the objects i investigate, and i generalize. i understand that others maybe doing math differently, but if one has this sort of view of math then the point just is that your experience might lead you to believing that axiom A is true, while another might be led to believing that axiom B is true, and this we have witnessed in physics. when i think about set theory, i have to really try hard to think in a way that determinacy fails…so for me, choice is never given. unlike Saharon who always thinks that choice is given. this is a result of what we have been doing separately, but it is hardly the case that we don’t understand each other….

then the substantial claim is that it is only a psychological illusion that A and B are disjoint, provided that both explain sufficiently many experiences, people who believe in A can understand and appreciate what people who believe in B do and vice a versa, at least the A person can understand why the B person sees the mathematical universe the way he or she sees it and vice a versa. there are precise conjectures that when resolved could support this view. i think the formal version of this view is not interesting, at least to me.

it is like Englsih vs Armenian, i believe that there is no formal translation that preserves full meaning between English and Armenian, but given enough time and exposure, i can explain the true meaning of every Armenian expression to someone who grew up in the states. That might require me putting the person in an Armenian environment for a long enough time, but the person will eventually get it (he or she has to also be open to learning the meaning of Armenian expressions). i am not sure what the formal version of such a claim is, and i am not sure if it is interesting to formalize it, it is certainly important to have words-for-words kind of translations, but as we all know, this doesn’t really do it.

Oh yes, I enjoyed speaking at Rutgers on the universal finite set. Philip Welch and Kameryn Williams and I are now proving a $\Sigma_1$ version for models of V=L. That is, there is a universal $\Sigma_1$-definable finite sequence of ordinals, which can be any sequence, and in any countable model of a suitable strengthening of V=L, there is an L-extension in which you can add any new elements to the sequence.

About your remarks here, the way I think about it is that we seem to have a variety of foundational frameworks in set theory and mathematics, and one can seem sometimes to interpret those frameworks within one another, either by going to definable inner models or forcing extensions, as you say. The gold standard for two such foundational theories to be “equivalent” or just-as-good in foundational matters, would be bi-interpretation. And so I find it extremely interesting that there basically are no nontrivial instances of bi-interpretation amongst the strengthenings of ZF.