A model of set theory with a definable copy of the complex field in which the two roots of -1 are set-theoretically indiscernible

Mathematicians are deeply familiar with the complex number field $\newcommand\C{\mathbb{C}}\C$, the algebraic closure of the real field $\newcommand\R{\mathbb{R}}\R$, which can be constructed from $\R$ by adjoining a new ideal element $i$, the imaginary unit, and forming the complex numbers $a+bi$ as formal pairs, defining the arithmetic subject to the rule $i^2=–1$. Thus we may add and multiply the complex numbers, according to the familiar rules:

$$(a+bi)+(c+di)=(a+c)+(b+d)i$$ $$(a+bi)\cdot(c+di)=(ac-bd)+(ad+bc)i.$$

The complex field thus provides a system of numbers giving sense to expressions like $\sqrt{–1}$, while obeying the familiar algebraic rules of a field. Hamilton had presented this conception of complex numbers as pairs of real numbers to the Royal Irish Academy in 1833.

One may easily observe in the complex numbers, however, that $–i$ is also a square root of $–1$, because

$$(–i)\cdot(–i)=(–1)^2\cdot i^2=i^2=-1.$$

Thus, both $i$ and $–i$ have the property of being square roots of $–1$, and indeed, these are the only square roots of $–1$ in the complex field.

A small conundrum may arise when one realizes that $–i$ therefore also fulfills what might have been taken as the “defining” property of the ideal element $i$, namely, that it squares to $–1$. So this property doesn’t actually define $i$, in light of the fact that there is another distinct object $–i$ that also has this property. Can we tell $i$ and $–i$ apart?

Not in the complex field, no, we cannot. The basic fact is that $i$ and $–i$ are indiscernible as complex numbers with respect to the algebraic structure of $\C$—any property that $i$ has in the structure $\langle\C,+,\cdot,0,1\rangle$ will also hold of $–i$. One way to see this is to observe that complex conjugation, the map $$a+bi\quad\mapsto\quad a-bi$$ is an automorphism of the complex number field, an isomorphism of the structure with itself. And since this automorphism swaps $i$ with $–i$, it follows that any statement true of $i$ in the complex numbers, expressible in the language of fields, will also hold of $–i$.

In fact, the complex number field $\C$ has an extremely rich automorphism group, and every irrational complex number is indiscernible from various doppelgängers. There is an automorphism of $\C$ that swaps $\sqrt{2}$ and $–\sqrt{2}$, for example, and another that permutes the cube roots of $5$, mapping the real root $\sqrt[3]{5}$ with the two nonreal roots. So these numbers can have no property not shared by their various automorphic images. The general fact is that every complex number, except the rational numbers, is moved by some automorphism of $\C$. One can begin to see this by noticing that there are two ways to embed the algebraic field extensions $\newcommand\Q{\mathbb{Q}}\Q(\sqrt{2})$ into $\C$, and both embeddings extend fully to automorphisms of $\C$.

Because there is an automorphism of $\C$ swapping $\sqrt{2}$ and $–\sqrt{2}$, it means that these two numbers are also indiscernible as complex numbers, just like $i$ and $–i$ were—any property that $\sqrt{2}$ holds in the complex numbers is also held by $–\sqrt{2}$. But wait a minute, how can that be? After all, $\sqrt{2}$ is positive and $–\sqrt{2}$ is negative, and isn’t this a property that separates them? Well, yes, in the real numbers $\R$ this is a separating property, and since the order is definable from the algebraic structure of the real field (positive numbers are exactly the nonzero squares), it is a real algebraic property that distinguishes $\sqrt{2}$ from $–\sqrt{2}$, as only the former has itself a square root in $\R$. But this definition does not work in $\C$, since both have square roots there, and more generally, the surprise is that the real numbers $\R$ are not definable as a subfield in the complex field $\C$—there is no property expressible in the language of fields that picks out exactly the real numbers. There are $2^{2^{\aleph_0}}$ many distinct ways to embed $\R$ as a subfield of $\C$, and none of them is definable in $\C$.

The conclusion is that if we regard the complex numbers with the field structure only, $\langle\C,+,\cdot,0,1\rangle$, then we cannot refer unambiguously to $i$ or $–i$, to $\sqrt{2}$ or $–\sqrt{2}$, or indeed to any irrational complex number. Every irrational number is moved by some automorphism of the complex field. The irrational algebraic numbers can be permuted in their finite sets of indiscernible roots of their irreducible polynomial, and any two transcendental complex numbers (transcendental over $\Q$) are automorphic. For example, there is an automorphism of $\C$ moving $e+2i$ to $1+\sqrt{\pi}i$.

Finding a path out of that chaos, mathematicians like to conceive of $\C$ as a field extension of $\R$, in effect fixing the copy of $\R$ in $\C$. It is as though we are working in the structure $\langle\C,+,\cdot,0,1,\R\rangle$, where we have augmented the complex field structure with a predicate picking out the real numbers. So this isn’t just a field, but a field with an identified subfield. In this structure, $\sqrt{2}$ and $\sqrt[3]{5}$ and so on are definable, since one has identified the real numbers and within that subfield the order on the reals is definable, and so we can define every real algebraic number using this order. With the predicate for $\R$ picking out the reals, the structure has only the one nontrivial automorphism, complex conjugation, and to my way of thinking, this is the reason that the indiscernibility issue is usually considered more prominently with $i$ and $–i$.

The indiscernibility of $i$ and $–i$ in the complex field has been written on at length in the philosophical literature, since it seems to refute a certain philosophical account of structuralism that might otherwise have seemed appealing. Namely, the relevant view is a version of abstract structuralism, the view that what mathematical objects are is the structural role that they play in a mathematical system. On this view the natural number $2$ simply is the role that $2$ plays in Dedekind arithmetic, the role of being the successor of the successor of zero (Dedekind arithmetic is the categorical second-order axiomatization of $\langle\newcommand\N{\mathbb{N}}\N,0,S\rangle$). The view is that what mathematical structure is is the structural roles that objects play in any instance of the structure. The structural role is exactly what is preserved by isomorphism, and so it would seem to be an invariant for the isomorphism orbits of an indidvidual with respect to a structure.

The problem with this version of abstract structuralism is that it seems to be refuted by the example of $i$ and $–i$ in the complex field. Precisely because these numbers are automorphic, they would seem each to play exactly the same role in the complex field—the two numbers are isomorphic copies of one another via complex conjugation. Thus, they are distinct numbers, but play the same structural role, and so we cannot seem to identify the abstract number with the structural roles. This problem occurs, of course, in any mathematical structure that is not rigid.

The numbers $i$ and $–i$ are indiscernible in the field structure of $\C$, but of course we can distinguish them in contexts with additional structure. For example, if we use the Hamilton presentation of the complex numbers as pairs of real numbers, representing $a+bi$ with the pair $(a,b)$, then the number $i$ has coordinates $(0,1)$ and $–i$ has coordinates $(0,-1)$. The complex field equipped with this coordinate structure, perhaps given by the real and imaginary parts operators—let us call it the complex plane, as opposed to the complex field—is a rigid structure in which $i$ and $–i$ are discernible and indeed definable.

Finally, this brings me to the main point of this blog post. What I would like to do is to prove that it is relatively consistent with ZFC that we can definably construct a copy of the complex numbers $\C$ in such a way that not only are $i$ and $–i$ indiscernible in the field structure, but actually the particular set-theoretic objects $i$ and $–i$ are indiscernible in the set-theoretic background in which the construction is undertaken.

Goal. A definable copy of the complex field in which the two square roots of $–1$ are indiscernible not only in the field structure, but also in the set-theoretic background in which the construction of the field takes place.

These two aims are in tension, for we want the particular copy $\C$ to be definable (as a particular set-theoretic object, not just defined up to isomorphism), but the individual square roots of $–1$ to be set-theoretically indiscernible.

The goal is not always possible. For example, some models of ZFC are pointwise definable, meaning that every individual set is definable in them by some distinguishing set-theoretic property. More generally, if the V=HOD axiom holds, then there is a definable global well order of the set-theoretic universe, and with any such order we could define a linear order on $\{i,–i\}$ in any definable copy of $\C$, which would allow us to define each of the roots. For these reasons, in some models of ZFC, it is not possible to achieve the goal, and the most we can hope for a consistency result.

But indeed, the consistency goal is achievable.

Theorem. If ZFC is consistent, then there is a model of ZFC that has a definable complete ordered field $\R$ with a definable algebraic closure $\C$, such that the two square roots of $–1$ in $\C$ are set-theoretically indiscernible, even with ordinal parameters.

Proof. The proof makes use of what are known as Grozek-Laver pairs, definable pair sets having no ordinal-definable element. See M. Groszek & R. Laver, Finite Groups of OD-conjugates, Periodica Mathematica Hungarica, v. 18, pp. 87–97 (1987), for a very general version of this. This theorem also appears at theorem 4.6 in my paper Ehrenfeucht’s lemma in set theory, joint with Gunter Fuchs, Victoria Gitman, and myself. The arguments provide a model of set theory with a definable pair set $A=\{i,j\}$, such that neither element $i$ nor $j$ is definable from ordinal parameters. The pair set is definable, but neither element is definable.

To undertake the construction, we start with one of the standard definable constructions of the real field $\R$. For example, we could use Dedekind cuts in $\Q$, where $\Q$ is constructed explicitly as the quotient field of the integer ring $\mathbb{Z}$ in some canonical definable manner, and where the integers are definably constructed from a definable copy of the natural numbers $\mathbb{N}$, such as the finite von Neumann ordinals. So we have a definable complete ordered field, the real field $\R$.

Given this and the set $A$, we follow a suggestion of Timothy Gowers in the discussion of this problem on Twitter. Namely, we use the elements of $A$ as variables to form the polynomial ring $\R[A]$, meaning $\R[i,j]$, where $i$ and $j$ are the two elements of $A$. It is not necessary to distinguish the elements of $A$ to form this ring of polynomials, since we take all finite polynomial expressions using real coefficients and elements of $A$ raised to a power. (In particular, although I have referred to the elements as $i$ and $j$, there is to be no suggestion that I am somehow saying $i$ is the “real” $i$; I am not, for I could have called them $j$,$i$ or $j$,$k$ or $a$,$a’$, and so on.) Then we quotient by the ideal $(i^2+1,i+j)$, which is defined symmetrically in the elements of $A$, since it is the same ideal as $(j^2+1,j+i)$. Let $\C$ be the quotient $\C=\R[i,j]/(i^2+1,i+j)$, which will make both $i$ and $j$ the two square roots of $–1$, and so by the fundamental theorem of algebra this is a copy of the complex numbers.

Since $\R$ and $A$ were definable, and we didn’t need ever to choose a particular element of $A$ in the construction to define the polynomial ring or the ideal, this copy of $\C$ is definable without parameters. But since $i$ and $j$ are set-theoretically indiscernible in the model of set theory in which we are undertaking the construction, it follows that their equivalence classes in the quotient are also indiscernible. And so we have a definable copy of the complex field $\C$, extending a definable copy of $\R$, in which the two square roots of $–1$ are indiscernible not just in the field structure, but fully in the set-theoretic background in which the fields were constructed. $\Box$

In particular, in this model of set theory, there will be absolutely no way to distinguish the two roots by any further definable structure, whether using second-order or higher-order definitions of the field $\C$ or using any definable set-theoretic property whatsoever.

The analysis suggests a natural further inquiry. Namely,

Question. Is there a model of set theory with a definable copy of the complex field $\C$, such that the hierarchy of relative definability and indiscernibility in $\C$ matches the set-theoretic relative definability and indiscernibility of the objects?

That is, we would want to mimic the phenomenon of $i$ and $–i$ in the above construction with all complex numbers, so that $\sqrt{2}$ and $–\sqrt{2}$ were also indiscernible, not just in this copy of $\C$ but also in the set-theoretic background, and $\sqrt[4]{2}$ was set-theoretically indiscernible from the other new fourth-root of $2$, but can set-theoretically define both $\sqrt{2}$ and $–\sqrt{2}$. In other words, I want the set-theoretic definability hierarchy to match the complex-number-theoretic definability hierarchy. I may post this question on MathOverflow, when I formulate a version of it with which I am satisfied. I believe it will be answered by iterated Sacks forcing in a manner similar to that used in many papers by Marcia Groszek, and in particular, in my paper with her, The Implicitly constructible universe.

Pointwise definable and Leibnizian extensions of models of arithmetic and set theory, MOPA seminar CUNY, November 2022

 This will be an online talk for the MOPA Seminar at CUNY on 22 November 2022 1pm. Contact organizers for Zoom access.

Abstract. I shall introduce a flexible new method showing that every countable model of PA admits a pointwise definable end-extension, one in which every individual is definable without parameters. And similarly for models of set theory, in which one may also achieve the Barwise extension result—every countable model of ZF admits a pointwise definable end-extension to a model of ZFC+V=L, or indeed any theory arising in a suitable inner model. A generalization of the method shows that every model of arithmetic of size at most continuum admits a Leibnizian extension, and similarly in set theory. 

Pseudo-countable models

[bibtex key=”Hamkins:Pseudo-countable-models”]

Download pdf at arXiv:2210.04838

Abstract. Every mathematical structure has an elementary extension to a pseudo-countable structure, one that is seen as countable inside a suitable class model of set theory, even though it may actually be uncountable. This observation, proved easily with the Boolean ultrapower theorem, enables a sweeping generalization of results concerning countable models to a rich realm of uncountable models. The Barwise extension theorem, for example, holds amongst the pseudo-countable models—every pseudo-countable model of ZF admits an end extension to a model of ZFC+V=L. Indeed, the class of pseudo-countable models is a rich multiverse of set-theoretic worlds, containing elementary extensions of any given model of set theory and closed under forcing extensions and interpreted models, while simultaneously fulfilling the Barwise extension theorem, the Keisler-Morley theorem, the resurrection theorem, and the universal finite sequence theorem, among others.

Self-similar self-similarity, in The Language of Symmetry

A playful account of symmetry, contributed as a chapter to a larger work, The Language of Symmetry, edited by Benedict Rattigan, Denis Noble, and Afiq Hatta, a collection of essays on symmetry that were also the basis of an event at the British Museum, The Language of Symmetry.

[bibtex key=”Hamkins2023:Self-similar-self-similarity”]

Pre-order the book at: https://www.routledge.com/The-Language-of-Symmetry/Rattigan-Noble-Hatta/p/book/9781032303949

My essay is available here:

Abstract. Let me tell a mathematician’s tale about symmetry. We begin with playful curiosity about a concrete elementary case—the symmetries of the letters of the alphabet, for instance. Seeking the essence of symmetry, however, we are pushed toward abstraction, to other shapes and higher dimensions. Beyond the geometric figures, we consider the symmetries of an arbitrary mathematical structure—why not the symmetries of the symmetries? And then, of course, we shall have the symmetries of the symmetries of the symmetries, and so on, iterating transfinitely. Amazingly, this process culminates in a sublime self-similar group of symmetries that is its own symmetry group, a self-similar self-similarity.

Download my essay for more…or order the book for the complete set!

Pointwise definable and Leibnizian models of arithmetic and set theory, realized in end extensions of a given model, Notre Dame Logic Seminar, October 2022

This will be a talk for the Notre Dame logic seminar, 11 October 2022, 2pm in Hales-Healey Hall.

Abstract.  I shall present very new results on pointwise definable and Leibnizian end-extensions of models of arithmetic and set theory. Using the universal algorithm, I shall present a new flexible method showing that every countable model of PA admits a pointwise definable $\Sigma_n$-elementary end-extension. Also, any model of PA of size at most continuum admits an extension that is Leibnizian, meaning that any two distinct points are separated by some expressible property. Similar results hold in set theory, where one can also achieve V=L in the extension, or indeed any suitable theory holding in an inner model of the original model.

Every countable model of arithmetic or set theory has a pointwise definable end extension

[bibtex key=”Hamkins:Every-countable-model-of-arithmetic-or-set-theory-has-a-pointwise-definable-end-extension”]


Abstract. According to the math tea argument, there must be real numbers that we cannot describe or define, because there are uncountably many real numbers, but only countably many definitions. And yet, the existence of pointwise definable models of set theory, in which every individual is definable without parameters, challenges this conclusion. In this article, I introduce a flexible new method for constructing pointwise definable models of arithmetic and set theory, showing furthermore that every countable model of Zermelo-Fraenkel ZF set theory and of Peano arithmetic PA has a pointwise-definable end extension. In the arithmetic case, I use the universal algorithm and its $\Sigma_n$ generalizations to build a progressively elementary tower making any desired individual $a_n$ definable at each stage $n$, while preserving these definitions through to the limit model, which can thus be arranged to be pointwise definable. A similar method works in set theory, and one can moreover achieve $V=L$ in the extension or indeed any other suitable theory holding in an inner model of the original model, thereby fulfilling the resurrection phenomenon. For example, every countable model of ZF with an inner model with a measurable cardinal has an end extension to a pointwise-definable model of $\text{ZFC}+V=L[\mu]$.

The sentence asserting its own non-forceability by nontrivial forcing

At the meeting here in Konstanz, Giorgo Venturi and I considered the sentence $\sigma$, which asserts its own non-forceability by nontrivial forcing. That is, $\sigma$ asserts that there is no nontrivial forcing notion forcing $\sigma$. $$\sigma\quad\iff\quad \neg\exists\mathbb{B}\ \Vdash_{\mathbb{B}}\sigma.$$ The sentence $\sigma$ would be a fixed-point of the predicate for not being nontrivially forceable.

In any model of set theory $V$ in which $\sigma$ is true, then in light of what it asserts, it would not be forceable by nontrivial forcing, and so it would be false in all nontrivial forcing extensions of that model $V[G]$. And in any model $W$ where it is false, then because of what it asserts, it would be nontrivially forceable, and so it would be true in some forcing extension of that model $W[G]$.

But this is a contradiction! It cannot ever be true, since if it were true in $V$, it would have to be false in all extensions $V[G]$, and therefore true in some subsequent extension $V[G][H]$. But that model is a forcing extension of $V$, contradicting the claim that it is false in all such extensions.

So it must always be false, but this can’t happen, since then in any given model, in light of what it asserts, it would have to be true. So it cannot ever be true or false.

Conclusion: there is no such sentence σ that asserts its own nontrivial forceability. This is no fixed-point for not being nontrivially forceable.

But doesn’t this contradict the fixed-point lemma? After all, the fixed-point lemma shows that we can produce fixed points for any expressible assertion.

The resolution of the conundrum is that although for any given assertion $\varphi$, we can express “$\varphi$ is forceable”, we cannot express “x is the Gödel code of a forceable sentence”, for reasons similar to those for Tarski’s theorem on the nondefinability of truth.

Therefore, we are not actually in a situation to apply the fixed-point lemma. And ultimately the argument shows that there can be no sentence $\sigma$ that asserts “$\sigma$ is not forceable by nontrivial forcing”.

Ultimately, I find the logic of this sentence $\sigma$, asserting its own non-nontrivial forceability, to be a set-theoretic forcing analogue of the Yablo paradox. The sentence holds in a model of set theory whenever it fails in all subsequent models obtained by forcing, and that relation is exactly what arises in the Yablo paradox.

Fregean abstraction in Zermelo-Fraenkel set theory: a deflationary account

Abstract. The standard treatment of sets and definable classes in first-order Zermelo-Fraenkel set theory accords in many respects with the Fregean foundational framework, such as the distinction between objects and concepts. Nevertheless, in set theory we may define an explicit association of definable classes with set objects $F\mapsto\varepsilon F$ in such a way, I shall prove, to realize Frege’s Basic Law V as a ZF theorem scheme, Russell notwithstanding. A similar analysis applies to the Cantor-Hume principle and to Fregean abstraction generally. Because these extension and abstraction operators are definable, they provide a deflationary account of Fregean abstraction, one expressible in and reducible to set theory—every assertion in the language of set theory allowing the extension and abstraction operators $\varepsilon F$, $\# G$, $\alpha H$ is equivalent to an assertion not using them. The analysis thus sidesteps Russell’s argument, which is revealed not as a refutation of Basic Law V as such, but rather as a version of Tarski’s theorem on the nondefinability of truth, showing that the proto-truth-predicate “$x$ falls under the concept of which $y$ is the extension” is not expressible.

[bibtex key=”Hamkins:Fregean-abstraction-deflationary-account”]

Full text available at arXiv:2209.07845

The math tea argument—must there be numbers we cannot describe or define? Pavia Logic Seminar


This will be a talk for the Philosophy Seminar at the IUSS, Scuola Universitaria Superiore Pavia, 28 September 2022.

(Note: This seminar will be held the day before the related conference Philosophy of Mathematics: Foundations, Definitions and Axioms, Italian Network for the Philosophy of Mathematics, 29 September to 1 October 2022. I shall be speaking at that conference on the topic, Fregean abstraction in set theory, a deflationary account.)

Abstract. According to the math tea argument, perhaps heard at a good afternoon tea, there must be some real numbers that we can neither describe nor define, since there are uncountably many real numbers, but only countably many definitions. Is it correct? In this talk, I shall discuss the phenomenon of pointwise definable structures in mathematics, structures in which every object has a property that only it exhibits. A mathematical structure is Leibnizian, in contrast, if any pair of distinct objects in it exhibit different properties. Is there a Leibnizian structure with no definable elements? We shall discuss many interesting elementary examples, eventually working up to the proof that every countable model of set theory has a pointwise definable extension, in which every mathematical object is definable.

Workshop on the Set-theoretic Multiverse, Konstanz, September 2022

Masterclass of “The set-theoretic multiverse” ten years after

Focused on mathematical and philosophical aspects of the set-theoretic multiverse and the pluralist debate in the philosophy of set theory, this workshop will have a master class on potentialism, a series of several speakers, and a panel discussion. To be held 21-22 September 2022 at the University of Konstanz, Germany. (Contact organizers for Zoom access.)

I shall make several contributions to the meeting.

Master class tutorial on potentialism

I shall give a master class tutorial on potentialism, an introduction to the general theory of potentialism that has been emerging in recent work, often developed as a part of research on set-theoretic pluralism, but just as often branching out to a broader application. Although the debate between potentialism and actualism in the philosophy of mathematics goes back to Aristotle, recent work divorces the potentialist idea from its connection with infinity and undertakes a more general analysis of possible mathematical universes of any kind. Any collection of mathematical structures forms a potentialist system when equipped with an accessibility relation (refining the submodel relation), and one can define the modal operators of possibility $\Diamond\varphi$, true at a world when $\varphi$ is true in some larger world, and necessity $\Box\varphi$, true in a world when $\varphi$ is true in all larger worlds. The project is to understand the structures more deeply by understanding their modal nature in the context of a potentialist system. The rise of modal model theory investigates very general instances of potentialist system, for sets, graphs, fields, and so on. Potentialism for the models of arithmetic often connects with deeply philosophical ideas on ultrafinitism. And the spectrum of potentialist systems for the models of set theory reveals fundamentally different conceptions of set-theoretic pluralism and possibility.

The multiverse view on the axiom of constructibility

I shall give a talk on the multiverse perspective on the axiom of constructibility. Set theorists often look down upon the axiom of constructibility V=L as limiting, in light of the fact that all the stronger large cardinals are inconsistent with this axiom, and furthermore the axiom expresses a minimizing property, since $L$ is the smallest model of ZFC with its ordinals. Such views, I argue, stem from a conception of the ordinals as absolutely completed. A potentialist conception of the set-theoretic universe reveals a sense in which every set-theoretic universe might be extended (in part upward) to a model of V=L. In light of such a perspective, the limiting nature of the axiom of constructibility tends to fall away.

Panel discussion: The multiverse view—challenges for the next ten years

This will be a panel discussion on the set-theoretic multiverse, with panelists including myself, Carolin Antos-Kuby, Giorgio Venturi, and perhaps others.

Pointwise definable end-extensions of the universe, Sophia 2022, Salzburg

This will be an online talk for the Salzburg Conference for Young Analytical Philosophy, the SOPhiA 2022 Salzburgiense Concilium Omnibus Philosophis Analyticis, with a special workshop session Reflecting on ten years of the set-theoretic multiverse. The workshop will meet Thursday 8 September 2022 4:00pm – 7:30pm.

The name of the workshop (“Reflecting on ten years…”), I was amazed to learn, refers to the period since my 2012 paper, The set-theoretic multiverse, in the Review of Symbolic Logic, in which I had first introduced my arguments and views concerning set-theoretic pluralism. I am deeply honored by this workshop highlighting my work in this way and focussing on the developments growing out of it.

In this talk, I shall engage in that discussion by presenting some very new work connecting several topics that have been prominent in discussions of the set-theoretic multiverse, namely, set-theoretic potentialism and pointwise definability.

Abstract. Using the universal algorithm and its generalizations, I shall present new work on the possibility of end-extending any given countable model of arithmetic or set theory to a pointwise definable model, one in which every object is definable without parameters. Every countable model of Peano arithmetic, for example, admits an end-extension to a pointwise definable model. And similarly, every countable model of ZF set theory admits an end-extension to a pointwise definable model of ZFC+V=L, as well as to pointwise definable models of other sufficient theories, accommodating large cardinals. I shall discuss the philosophical significance of these results in the philosophy of set theory with a view to potentialism and the set-theoretic multiverse.

Nonlinearity and illfoundedness in the hierarchy of large cardinal consistency strength

[bibtex key=”Hamkins:Nonlinearity-in-the-hierarchy-of-large-cardinal-consistency-strength”]


Abstract. Many set theorists point to the linearity phenomenon in the hierarchy of consistency strength, by which natural theories tend to be linearly ordered and indeed well ordered by consistency strength. Why should it be linear? In this paper I present counterexamples, natural instances of nonlinearity and illfoundedness in the hierarchy of large cardinal consistency strength, as natural or as nearly natural as I can make them. I present diverse cautious enumerations of ZFC and large cardinal set theories, which exhibit incomparability and illfoundedness in consistency strength, and yet, I argue, are natural. I consider the philosophical role played by “natural” in the linearity phenomenon, arguing ultimately that we should abandon empty naturality talk and aim instead to make precise the mathematical and logical features we had found desirable.

Quantifer elimination

A theory admits quantifier-elimination when every assertion is logically equivalent over the theory to a quantifier-free assertion. This is quite a remarkable property when it occurs, because it reveals a severe limitation on the range of concepts that can be expressed in the theory—a quantifier-free assertion, after all, is able to express only combinations of the immediate atomic facts at hand. As a result, we are generally able to prove quantifier-elimination results for a theory only when we already have a profound understanding of it and its models, and the quantifier-elimination result itself usually leads quickly to classification of the definable objects, sets, and relations in the theory and its models. In this way, quantifier-elimination results often showcase our mastery over a particular theory and its models. So let us present a few quantifier-elimination results, exhibiting our expertise over some natural theories.

Endless dense linear orders

$\def\<#1>{\left\langle#1\right\rangle}\newcommand\Q{\mathbb{Q}}\newcommand\R{\mathbb{R}}\newcommand\N{\mathbb{N}}\newcommand\bottom{\mathord{\perp}}\newcommand{\Th}{\mathop{\rm Th}}\newcommand{\unaryminus}{-}\newcommand\Z{\mathbb{Z}}\newcommand\divides{\mid}$Consider first the theory of an endless dense linear order, such as the rational order $\<\Q,<>$. In light of Cantor’s theorems on the universality of the rational line and the categoricity theorem for countable endless dense linear orders, we already have a fairly deep understanding of this theory and this particular model.

Consider any two rational numbers $x,y$ in the structure $\<\Q,<>$. What can one say about them? Well, we can certainly make the atomic assertions that $x$ to a Boolean combination of these assertions.

Theorem. The theory of the rational order $\<\Q,<>$ admits elimination of quantifiers—every assertion $\varphi(x,\ldots)$ is logically equivalent in the rational order to a quantifier-free assertion.

Proof. To see this, observe simply by Cantor’s categoricity theorem for countable dense linear orders that any pair $x<y$ in $\Q$ is automorphic to any other such pair $x'<y’$, and similarly for pairs with $x=y$ or $y<x$. Consequently, $\varphi(x,y)$ either holds of all pairs with $x<y$ or of none of them, of all pairs with $x=y$ or none, and of all pairs with $y<x$ or none. The assertion $\varphi(x,y)$ is therefore equivalent to the disjunction of the three atomic relations for which it is realized, including $\top$ as the disjunction of all three atomic possibilities and $\bottom$ as the empty disjunction.

More generally, a similar observation applies to assertions $\varphi(x_1,\ldots,x_n)$ with more free variables. By Cantor’s theorem, every $n$-tuple of points in $\Q$ is automorphic with any other such $n$-tuple of points having the same atomic order relations. Therefore any assertion holding of one such $n$-tuple holds of all $n$-tuples with that same atomic type, and consequently every assertion $\varphi(x_1,\ldots,x_n)$ is logically equivalent in $\<\Q,<>$ to a disjunction of those combinations of atomic relations amongst the variables $x_1,\ldots,x_n$ for which it holds. In particular, every assertion is equivalent in $\<\Q,<>$ to a quantifier-free assertion. In short, the theory of this model $\Th(\<\Q,<>)$ admits elimination of quantifiers. $\Box$

What about other endless dense linear orders? The argument we have given so far is about the theory of this particular model $\<\Q,<>$. In fact, the theory of the rational order is exactly the theory of endless dense linear orders, because this theory is complete, which one can see as an immediate consequence of the categoricity result of Cantor’s theorem and the downward Löwenheim-Skolem theorem. In my book, I have not yet proved the Löwenheim-Skolem theorem at this stage, however, and so let me give a direct proof of quantifier-elimination in the theory of endless dense linear orders, from which we can also derive the completeness of this theory.

Theorem. In the theory of endless dense linear orders, every statement is logically equivalent to a quantifier-free statement.

Proof. To clarify, the quantifier-free statement will have the same free variables as the original assertion, provided we allow $\bottom$ and $\top$ as logical constants. We argue by induction on formulas. The claim is of course already true for the atomic formulas, and it is clearly preserved under Boolean connectives. So it suffices inductively to eliminate the quantifier from $\exists x\, \varphi(x,\ldots)$, where $\varphi$ is itself quantifier-free. We can place $\varphi$ in disjunctive normal form, a disjunction of conjunction clauses, where each conjunction clause is a conjunction of literals, that is, atomic or negated atomic assertions. Since the atomic assertions $x<y$, $x=y$ and $y<x$ are mutually exclusive and exhaustive, the negation of any one of them is equivalent to the disjunction of the other two. Thus we may eliminate any need for negation. By redistributing conjunction over disjunction as much as possible, we reduce to the case of $\exists x\,\varphi$, where $\varphi$ is in disjunctive normal form without any negation. The existential quantifier distributes over disjunction, and so we reduce to the case $\varphi$ is a conjunction of atomic assertions. We may eliminate any instance of $x=x$ or $y=y$, since these impose no requirement. We may assume that the variable $x$ occurs in each conjunct, since otherwise that conjunct commutes outside the quantifier. If $x=y$ occurs in $\varphi$ for some variable $y$ not identical to $x$, then the existential claim is equivalent to $\varphi(y,\ldots)$, that is, by replacing every instance of $x$ with $y$, and we have thus eliminated the quantifier. If $x<x$ occurs as one of the conjuncts, this is not satisfiable and so the assertion is equivalent to $\bottom$. Thus we have reduced to the case where $\varphi$ is a conjunction of assertions of the form $x<y_i$ and $z_j<x$. If only one type of these occurs, then the assertion $\exists x\,\varphi$ is outright provable in the theory by the endless assumption and thus equivalent to $\top$. Otherwise, both types $x<y_i$ and $z_j<x$ occur, and in this case the existence of an $x$ obeying this conjunction of assertions is equivalent over the theory of endless dense linear orders to the quantifier-free conjunction $\bigwedge_{i,j}z_j<y_i$, since there will be an $x$ between them in this case and only in this case. Thus, we have eliminated the quantifier $\exists x$, and so by induction every formula is equivalent over this theory to a quantifier-free formula. $\Box$

Corollary. The theory of endless dense linear orders is complete.

Proof. If $\sigma$ is any sentence in this theory, then by theorem above, it is logically equivalent to a Boolean combination of quantifier-free assertions with the same variables. Since $\sigma$ is a sentence and there are no quantifier-free atomic sentences except $\bottom$ and $\top$, it follows that $\sigma$ is equivalent over the theory to a Boolean combination of $\bottom$ or $\top$. All such sentences are equivalent either to $\bottom$ or $\top$, and thus either $\sigma$ is entailed by the theory or $\neg\sigma$ is, and so the theory is complete. $\Box$

Corollary. In any endless dense linear order, the definable sets (allowing parameters) are precisely the finite unions of intervals.

Proof. By intervals we mean a generalized concept allowing either open or closed endpoints, as well as rays, in any of the forms:
$$(a,b)\qquad [a,b]\qquad [a,b)\qquad (a,b]\qquad (a,\infty)\qquad [a,\infty)\qquad (\unaryminus\infty,b)\qquad (\unaryminus\infty,b]$$
Of course any such interval is definable, since $(a,b)$ is defined by $(a<x)\wedge(x<b)$, taking the endpoints $a$ and $b$ as parameters, and $(-\infty,b]$ is defined by $(x<b)\vee (x=b)$, and so on. Thus, finite unions of intervals are also definable by taking a disjunction.

Conversely, any putative definition $\varphi(x,y_1,\ldots,y_n)$ is equivalent to a Boolean combination of atomic assertions concerning $x$ and the parameters $y_i$. Thus, whenever it is true for some $x$ between, above, or below the parameters $y_i$, it will be true of all $x$ in that same interval, and so the set that is defined will be a finite union of intervals having the parameters $y_i$ as endpoints, with the intervals being open or closed depending on whether the parameters themselves satisfy the formula or not. $\Box$

Theory of successor

Let us next consider the theory of a successor function, as realized for example in the Dedekind model, $\<\N,S,0>$, where $S$ is the successor
function $Sn=n+1$. The theory has the following three axioms:
$$\forall x\, (Sx\neq 0)$$

$$\forall x,y\, (Sx=Sy\implies x=y)$$

$$\forall x\, \bigl(x\neq 0\implies \exists y\,(Sy=x)\bigr).$$
In the Dedekind model, every individual is definable, since $x=n$ just in case $x=SS\cdots S0$, where we have $n$ iterative applications of $S$. So this is a pointwise definable model, and hence also Leibnizian. Note the interplay between the $n$ of the object theory and $n$ of the metatheory in the claim that every individual is definable.

What definable subsets of the Dedekind model can we think of? Of course, we can define any particular finite set, since the numbers are definable as individuals. For example, we can define the set ${1,5,8}$ by saying, “either $x$ has the defining property of $1$ or it has the defining property of $5$ or it has the defining property of $8$.” Thus any finite set is definable, and by negating such a formula, we see also that any cofinite set—the complement of a finite set—is definable. Are there any other definable sets? For example, can we define the set of even numbers? How could we prove that we cannot? The Dedekind structure has no automorphisms, since all the individuals are definable, and so we cannot expect to use automorphism to show that the even numbers are not definable as a set. We need a deeper understanding of definability and truth in this structure.

Theorem. The theory of a successor function admits elimination of quantifiers—every assertion is equivalent in this theory to a quantifier-free assertion.

Proof. By induction on formulas. The claim is already true for atomic assertions, since they have no quantifiers, and quantifier-free assertions are clearly closed under the Boolean connectives. So it suffices by induction to eliminate the quantifier from assertions of the form $\exists x\, \varphi(x,\ldots)$, where $\varphi$ is quantifier free. We may place $\varphi$ in disjunctive normal form, and since the quantifier distributes over disjunction, we reduce to the case that $\varphi$ is a conjunction of atomic and negated atomic assertions. We may assume that $x$ appears in each atomic conjunct, since otherwise we may bring that conjunct outside the quantifier. We may furthermore assume that $x$ appears on only one side of each atomic clause, since otherwise the statement is either trivially true as with $SSx=SSx$ or $Sx\neq SSx$, or trivially false as with $Sx=SSx$. Consider for example:
$$\exists x\,\bigl[(SSSx=y)\wedge (SSy=SSSz)\wedge (SSSSSx=SSSw)\wedge{}$$
$$\hskip1in{}\wedge (Sx\neq SSSSw)\wedge (SSSSy\neq SSSSSz)\bigr]$$
We can remove duplicated $S$s occurring on both sides of an equation. If $x=S^ky$ appears, we can get rid of $x$ and replace all occurrences with $S^ky$. If $S^nx=y$ appears, can add $S$’s everywhere and then replace any occurrence of $S^nx$ with $y$. If only inequalities appear, then the statement is simply true.

For example, since the third clause in the formula above is equivalent to $SSx=w$, we may use that to omit any need to refer to $x$, and the formula overall is equivalent to
$$(Sw=y)\wedge (y=Sz)\wedge (w\neq SSSSSw)\wedge (y\neq Sz),$$ which has no quantifiers.
Since the method is completely general, we have proved that the theory of successor admits elimination of quantifiers. $\Box$

It follows that the definable sets in the Dedekind model $\<\N,S,0>$, using only the first-order language of this structure, are precisely the finite and cofinite sets.

Corollary. The definable sets in $\<\N,S,0>$ are precisely the finite and cofinite sets

Proof. This is because an atomic formula defines a finite set, and the collection of finite or cofinite sets is closed under negation and Boolean combinations. Since every formula is equivalent to a quantifier-free formula, it follows that every formula is a Boolean combination of atomic formulas, and hence defines a finite or cofinite set. $\Box$

In particular, the concepts of being even or being odd are not definable from the successor operation in $\<\N,S,0>$, since the set of even numbers is neither finite nor cofinite.

Corollary. The theory of a successor function is complete—it is the theory of the standard model $\<\N,S,0>$.

Proof. If $\sigma$ is a sentence in the language of successor, then by the quantifier-elimination theorem it is equivalent to a quantifier-free assertion in the language with the successor function $S$ and constant symbol $0$. But the only quantifier-free sentences in this language are Boolean combinations of equations of the form $S^n0=S^k0$. Since all such equations are settled by the theory, the sentence itself is settled by the theory, and so the theory is complete. $\Box$

We saw that the three axioms displayed on the previous page were true in the Dedekind model $\<\N,S,0>$. Are there any other models of these axioms? Yes, there are. For example, we can add another $\Z$-chain of successors on the side, as with $\N+\Z$ or $\N\sqcup\Z$, although we shall see that the order is not definable. What are the definable elements in the enlarged structure? Still $0$ and all its finite successors are definable as before. But no elements of the $\Z$-chains can be definable, because we may perform an automorphism of the structure that translates elements within the $\Z$-chain by a fixed amount.

Let me prove next that the theory implies the induction axiom schema.

Corollary. The theory of successor (the three axioms) implies the induction axiom schema in the language of successor, that is, the following assertion for any assertion $\varphi(x)$:
$$\left[\varphi(0)\wedge\bigl(\forall x\,\bigl(\varphi(x)\implies\varphi(Sx)\bigr)\right]\implies\forall x\,\varphi(x)$$

Proof. Consider the set defined by $\varphi(x)$. By the earlier corollary, it must be eventually periodic in the standard model $\<\N,S,0>$. But by the induction assumption stated in the theorem, it must hold of every number in the standard model. So the standard model thinks that $\forall x\,\varphi(x)$. But the theory of the standard model is the theory of successor, which is complete. So the theory of successor entails that $\varphi$ is universal, as desired. $\Box$

In other words, in the trivial theory of successor–the three axioms—we get the corresponding induction axiom for free.

Presburger arithmetic

Presburger arithmetic is the theory of addition on the natural numbers, that is, the theory of the structure $\<\N,+,0,1>$. The numbers $0$ and $1$ are actually definable here from addition alone, since $0$ is the unique additive identity, and $1$ is the only number $u$ that is not expressible as a sum $x+y$ with both $x\neq u$ and $y\neq u$. So we may view this model if desired as a definitional expansion of $\<\N,+>$, with addition only. The number $2$ is similarly definable as $1+1$, and indeed any number $n$ is definable as $1+\cdots+1$, with $n$ summands, and so this is a pointwise definable model and hence also Leibnizian.

What are the definable subsets? We can define the even numbers, of course, since $x$ is even if and only if $\exists y\,(y+y=x)$. We can similarly define congruence modulo $2$ by $x\equiv_2 y\iff \exists z\,\bigl[(z+z+x=y)\vee (z+z+y=x)\bigr]$. More generally, we can express the relation of congruence modulo $n$ for any fixed $n$ as follows:
$$x\equiv_n y\quad\text{ if and only if }\exists z\,\bigl[(\overbrace{z+\cdots+z}^n+x=y)\vee(\overbrace{z+\cdots+z}^n+y=x)\bigr].$$
What I claim is that this exhausts what is expressible.

Theorem. Presburger arithmetic in the definitional expansion with all congruence relations, that is, the theory of the structure
admits elimination of quantifiers. In particular, every assertion in the language of $\<\N,+,0,1>$ is equivalent to a quantifer-free assertion in the language with the congruence relations.

Proof. We consider Presburger arithmetic in the language with addition $+$, with all the congruence relations $\equiv_n$ for every $n\geq 2$, and the constants $0$ and $1$. We prove quantifier-elimination in this language by induction on formulas. As before the claim already holds for atomic assertions and is preserved by Boolean connectives. So it suffices to eliminate the quantifier from assertions of the form $\exists x\,\varphi(x,\ldots)$, where $\varphi$ is quantifier-free. By placing $\varphi$ into disjunctive normal form and distributing the quantifier over the disjunction, we may assume that $\varphi$ is a conjunction of atomic and negated atomic assertions. Note that negated congruences are equivalent to a disjunction of positive congruences, such as in the case:
$$x\not\equiv_4 y\quad\text{ if and only if }\quad (x+1\equiv_4y)\vee(x+1+1\equiv_4y)\vee (x+1+1+1\equiv_4 y).$$
We may therefore assume there are no negated congruences in $\varphi$. By canceling like terms on each side of an equation or congruence, we may assume that $x$ occurs on only one side. We may assume that $x$ occurs nontrivially in every conjunct of $\varphi$, since otherwise this conjunct commutes outside the quantifier. Since subtraction modulo $n$ is the same as adding $n-1$ times, we may also assume that all congruences occurring in $\varphi$ have the form $kx\equiv_n t$, where $kx$ denotes the syntactic expression $x+\cdots+x$ occurring in the formula, with $k$ summands, and $t$ is a term not involving the variable $x$. Thus, $\varphi$ is a conjunction of expressions each having the form $kx\equiv_n t$, $ax+r=s$, or $bx+u\neq v$, where $ax$ and $bx$ similarly denote the iterated sums $x+\cdots+x$ and $r,s,u,v$ are terms not involving $x$.

If indeed there is a conjunct of the equality form $ax+r=s$ occurring in $\varphi$, then we may omit the quantifier as follows. Namely, in order to fulfill the existence assertion, we know that $x$ will have to solve $ax+r=s$, and so in particular $r\equiv_a s$, which ensures the existence of such an $x$, but also in this case any inequality $bx+u\neq v$ can be equivalently expressed as $abx+au\neq av$, which since $ax+r=s$ is equivalent to $bs+au\neq av+br$, and this does does not involve $x$; similarly, any congruence $kx\equiv_n t$ is equivalent to $akx\equiv_{an}at$, which is equivalent to $s\equiv_{an} r+at$, which again does not involve $x$. Thus, when there is an equality involving $x$ present in $\varphi$, then we can use that fact to express the whole formula in an equivalent manner not involving $x$.

So we have reduced to the case $\exists x\,\varphi$, where $\varphi$ is a conjunction of inequalities $bs+u\neq v$ and congruences $kx\equiv_n t$. We can now ignore the inequalities, since if the congruence system has a solution, then it will have infinitely many solutions, and so there will be an $x$ solving any finitely given inequalities. So we may assume that $\varphi$ is simply a list of congruences of the form $kx\equiv_n t$, and the assertion is that this system of congruences has a solution. But there are only finitely many congruences mentioned, and so by working modulo the least common multiple of the bases that occur, there are only finitely many possible values for $x$ to be checked. And so we can simply replace $\varphi$ with a disjunction over these finitely many values $i$, replacing $x$ in each conjunction with $1+\cdots+1$, using $i$ copies of $1$, for each $i$ up to the least common multiples of the bases that arise in the congruences appearing in $\varphi$. If there is an $x$ solving the system, then one of these values of $i$ will work, and conversely.

So we have ultimately succeeded in expressing $\exists x\,\varphi$ in a quantifier-free manner, and so by induction every assertion in Presburger arithmetic is equivalent to a quantifier-free assertion in the language allowing addition, congruences, and the constants $0$ and $1$. $\Box$

Corollary. The definable sets in $\<\N,+,0,1>$ are exactly the eventually periodic sets.

Proof. Every periodic set is definable, since one can specify the set up to the period $p$, and then express the invariance modulo $p$. Any finite deviation from a definable set also is definable, since every individual number is definable. So every eventually period set is definable. Conversely, every definable set is defined by a quantifier-free assertion in the language of $\<\N,+,0,1,\equiv_2,\equiv_3,\equiv_4,\ldots>$. We may place the definition in disjunctive normal form, and again replace negated congruences with a disjunction of positive congruences. For large enough values of $x$, the equalities and inequalities appearing in the definition become irrelevant, and so the definition eventually agrees with a finite union of solutions of congruence systems. Every such system is periodic with a period at most the least common multiple of the bases of the congruences appearing in it. And so every definable set is eventually periodic, as desired. $\Box$

Corollary. Multiplication is not definable in $\<\N,+,0,1>$. Indeed, the squaring operation is not definable, and neither is the divisibility relation $p\divides q$.

Proof. If we could define multiplication, or even the squaring operation, then we would be able to define the set of perfect squares, but this is not eventually periodic. Similarly, if we could define the divides relation $p\divides q$, then we could define the set of prime numbers, which is not eventually periodic. $\Box$

Real-closed field

Let us lastly consider the ordered real field $\<\R,+,\cdot,0,1,<>$. I want to mention (without proof) that a deep theorem of Tarski shows that this structure admits elimination of quantifiers: every assertion is equivalent in this structure to a quantifier-free assertion. In fact all that is need is that this is a real-closed field, an ordered field in which every odd-degree polynomial has a root and every positive number has a square root.

We can begin to gain insight to this fact by reaching into the depths of our high-school education. Presented with an equation $ax^2+bx+c=0$ in the integers, we know by the quadratic formula that the solution is $x=\left(-b\pm\sqrt{b^2-4ac}\right)/2a$, and in particular, there is a solution in the real numbers if and only if $b^2-4ac\geq 0$, since otherwise a negative discriminant means the solution is a complex number. In other words,
$$\exists x\,(ax^2+bx+c=0)\quad\text{ if and only if }\quad b^2-4ac\geq 0.$$
The key point is that this an elimination of quantifiers result, since we have eliminated the quantifier $\exists x$.

Tarski’s theorem proves more generally that every assertion in the language of ordered fields is equivalent in real-closed fields to a quantifier-free assertion. Furthermore, there is a computable procedure to find the quantifier-free equivalent, as well as a computable procedure to determine the truth of any quantifier-free assertion in the theory of real-closed fields.

What I find incredible is that it follows from this that there is a computable procedure to determine the truth of any first-order assertion of Cartesian plane geometry, since all such assertions are expressible in the language of $\<\R,+,\cdot,0,1,<>$. Amazing! I view this as an incredible culmination of two thousand years of mathematical investigation: we now have an algorithm to determine by rote procedure the truth of any statement in Cartesian geometry. Meanwhile, a counterpoint is that the decision procedure, unfortunately, is not computationally feasible, however, since it takes more than exponential time, and it is a topic of research to investigate the computational running time of the best algorithms.

The logic of definite descriptions

We use a definite description when we make an assertion about an individual, referring to that individual by means of a property that uniquely picks them out. When I say, “the badly juggling clown in the subway car has a sad expression” I am referring to the clown by describing a property that uniquely determines the individual to whom I refer, namely, the clown that is badly juggling in the subway car, that clown, the one fulfilling this description. Definite descriptions in English typically involve the definite article “the” as a signal that one is picking out a unique object or individual.

If there had been no clown in the subway car, then my description wouldn’t have succeeded—there would have been no referent, no unique individual falling under the description. My reference would similarly have failed if there had been a clown, but no juggling clown, or if there had been a juggling clown, but juggling well instead of badly, or indeed if there had been many juggling clowns, perhaps both in the subway car and on the platform, but all of them juggling very well (or at least the ones in the subway car), for in this case there would have been no badly juggling clown in the subway car. My reference would also have failed, in a different way, if the subway car was packed full of badly juggling clowns, for in this case the description would not have succeeded in picking out just one of them. In each of these failing cases, there seems to be something wrong or insensible with my statement, “the badly juggling clown in the subway car has a sad expression.” What would be the meaning of this assertion if there was no such clown, if for example all the clowns were juggling very well?

Bertrand Russell emphasized that when one makes an assertion involving a definite description like this, then part of what is being asserted is that the definite description has succeeded. According to Russell, when I say, “the book I read last night was fascinating,” then I am asserting first of all that indeed there was a book that I read last night and exactly one such book, and furthermore that this book was fascinating. For Russell, the assertion “the king of France is bald” asserts first, that there is such a person as the king of France and second, that the person fitting that description is bald. Since there is no such person as the king of France, Russell takes the statement to be false.

Iota expressions

$\newcommand\satisfies{\models}\newcommand\iiota{℩}\def\<#1>{\left\langle#1\right\rangle}\newcommand\R{\mathbb{R}}\newcommand\Z{\mathbb{Z}}\newcommand\N{\mathbb{N}}\def\valuation[#1]{\pmb{\left[\vphantom{#1}\right.} #1 \pmb{\left.\vphantom{#1}\right]}}$Let us introduce a certain notational formalism, arising in Russell and Whitehead (1910-1913), to assist with our analysis of definite descriptions, namely, the inverted iota notation $\bigl(\iiota x\,\psi(x)\bigr)$, which is a term denoting “the $x$ for which $\psi(x)$.” Such a reference succeeds in a model $M$ precisely when there is indeed a unique $x$ for which $\psi(x)$ holds,
or in other words, when
$$M\satisfies\exists x\forall y\,\bigl(x=y\iff\psi(y)\bigr).$$
The value of the term $\bigl(\iiota x\,\psi(x)\bigr)$ interpreted in $M$ is this unique object fulfilling property $\psi$. The use of iota expressions is perhaps the most meaningful when this property is indeed fulfilled, that is, when the reference succeeds, and we might naturally take them otherwise to be meaningless or undefined, a failed reference.

Because the iota expressions are not always meaningful in this way, their treatment in formal logic faces many of the same issues faced by a formal treatment of partial functions, functions that are not necessarily defined on the whole domain of discourse. According to the usual semantics for first-order logic, the interpretation of a function symbol $f$ is a function defined on the whole domain of the model—in other words, we interpret functions symbols with total functions.

But partial functions commonly appear throughout mathematics, and we might naturally seek a formal treatment of them in first-order logic. One immediate response to this goal is simply to point out that partial functions are already fruitfully and easily treated in first-order logic by means of their graph relations $y=g(x)$. One can already express everything one would want to express about a partial function $g$ by reference to the graph relation—whether a given point is in the domain of the function and if it is, what the value of the function is at that point and so on. In this sense, first-order logic already has a robust treatment of partial functions.

In light of that response, the dispute here is not about the expressive power of the logic, but is rather entirely about the status of terms in the language, about whether we should allow partial functions to appear as terms. To be sure, mathematicians customarily form term expressions, such as $\sqrt{x^2-3}$ or $1/x$ in the context of the real numbers $\R$, which are properly defined only on a subset of the domain of discourse, and in this sense, allowing partial functions as terms can be seen as aligned with mathematical practice.

But the semantics are a surprisingly subtle matter. The main issue is that when a term is not defined it may not be clear what the meaning is of assertions formed using that term. To illustrate the point, suppose that $e(x)$ is a term arising from a partial function or from an iota expression that is not universally defined in a model $M$, and suppose that $R$ is a unary relation that holds of every individual in the model. Do we want to say that $M\satisfies\forall x\ R\bigl(e(x)\bigr)$? Is it true that for every person the elephant they are riding is self-identical? Well, some people are not riding any elephant, and so perhaps we might say, no, that shouldn’t be true, since some values of $e(x)$ are not defined, and so this statement should be false. Perhaps someone else suggests that it should be true, because $R\bigl(e(x)\bigr)$ will hold whenever $e(x)$ does succeed in its reference—in every case where someone is riding an elephant, it is self-identical. Or perhaps we want to say the whole assertion is meaningless? If we say it is meaningful but false, however, then it would seem we would want to say $M\satisfies\neg\forall x\ R\bigl(e(x)\bigr)$ and consequently also $M\satisfies\exists x\ \neg R\bigl(e(x)\bigr)$. In other words, in this case we are saying that in $M$ that there is some $x$ such that $e(x)$ makes the always-true predicate $R$ false—there is a person, such that the elephant they are riding is not self-identical. That seems weird and probably undesirable, since it only works because $e(x)$ must be undefined for this $x$. Furthermore, this situation seems to violate Russell’s injunction that assertions involving a definite description are committed to the success of that reference, for in this case, the truth of the assertion $\exists x\ \neg R\bigl(e(x)\bigr)$ is based entirely on the failure of the reference $e(x)$. Ultimately we shall face such decisions in how to define the semantics in the logic of iota expressions and more generally in the first-order logic of partial functions as terms.

The strong semantics for iota expressions

Let me first describe what I call the strong semantics for the logic of iota expressions. Inspired by Russell’s theory of definite descriptions, we shall define the truth conditions for every assertion in the extended language allowing iota expressions $(\iiota x\,\varphi)$ as terms. Notice that $\varphi$ itself might already have iota expressions inside it; the formulas of the extended language can be graded in this way by the nesting rank of their iota expressions. The strong semantics I shall describe here also works more generally for the logic of partial functions.

For any model $M$ and valuation $\valuation[\vec a]$, we shall define the satisfaction relation $M\satisfies\varphi\valuation[\vec a]$ and the term valuation $t^M\valuation[\vec a]$ recursively. In this logic, the interpretation of terms $t^M\valuation[\vec a]$ in a model $M$ will be merely partial, not necessarily defined in every case. Nevertheless, for a given iota expression $(\iiota x\,\varphi)$, if we already defined the semantics for $\varphi$, then the term $\bigl(\iiota x\,\varphi)^M\valuation[\vec a]$ is defined to be the unique individual $b$ in $M$, if there is one, for which $M\satisfies\varphi\valuation[b,\vec a]$, where $b$ is the value of variable $x$ in this valuation; and if there is not a unique individual with this property, then this term is not defined. An atomic assertion of the form $(Rt_1\cdots t_n)\valuation[\vec a]$ is defined to be true in $M$ if all the terms $t_i^M\valuation[\vec a]$ are defined in $M$ and $R^M(t_1^M\valuation[\vec a],\ldots,t_n^M\valuation[\vec a])$ holds. And similarly an atomic assertion of the form $s=t$ is true in $M$ with valuation $\valuation[\vec a]$ if and only if $t^M\valuation[\vec a]$ and $s^M\valuation[\vec a]$ are both defined and they are equal. Notice that if a term expression is not defined, then every atomic assertion it appears in will be false. Thus, in the atomic case we have implemented Russell’s theory of definite descriptions.

We now simply extend the satisfaction relation recursively in the usual way through Boolean connectives and quantifiers. That is, the model satisfies a conjunction $M\satisfies(\varphi\wedge\psi)\valuation[\vec a]$ just in case it satisfies both of them $M\satisfies\varphi\valuation[\vec a]$ and $M\satisfies\psi\valuation[\vec a]$; it satisfies a negation $M\satisfies\neg\varphi\valuation[\vec a]$ if and only if it fails to satisfy $\varphi$, and so on as usual with the other logical connectives. For quantifiers, $M\satisfies\forall x\ \varphi\valuation[\vec a]$ if and only if $M\satisfies\varphi\valuation[b,\vec a]$ for every individual $b$ in $M$, using the valuation assigning value $b$ to variable $x$; and similarly for the existential quantifier.

The strong semantics in effect combine Russell’s treatment of definite descriptions in the case of atomic assertions with Tarski’s disquotational theory to extend the truth conditions recursively to complex assertions. The strong semantics are complete—every assertion $\varphi$ or its negation $\neg\varphi$ will be true in $M$ at any valuation of the free variables, because if $\varphi$ is not true in $M$ at $\valuation[\vec a]$, then by definition, $\neg\varphi$ will be declared true. In particular, exactly one of the sentences will be true, even if they involve definite descriptions that do not refer.

No new expressive power

The principal observation to be made initially about the logic of iota expressions is that they offer no new expressive power to our language. Every assertion that can be made in the language with iota expressions can be made equivalently without them. In short, iota expressions are logically eliminable.

Theorem. Every assertion in the language with iota expressions is logically equivalent in the strong semantics to an assertion in the original language.

Proof. We prove this by induction on formulas. Of course, the claim is already true for all assertions in the original language, and since the strong semantics in the iota expression logic follow the same recursion for Boolean connectives and quantifiers as in the original language, it suffices to remove iota expressions from atomic assertions $Rt_1\cdots t_n$, where some of the terms involve iota expressions. Consider for simplicity the case of $R(\iiota x\,\varphi)$, where $\varphi$ is a formula in the original language with no iota expressions. This assertion is equivalent to the assertion that $(\iiota x\,\varphi)$ exists, which we expressed above as $\exists x\forall y\, \bigl(x=y\iff\varphi(y)\bigr)$, together with the assertion that $R$ holds of that value, which can be expressed as $\forall x\,\bigl(\varphi(x)\implies Rx\bigr)$. The argument is similar if additional functions were applied on top of the iota term or if there were multiple iota terms—one simply says that the overall atomic assertion is true if each of the iota expressions appearing in it is defined and the resulting values fulfill the atomic assertion. By this means, the iota terms can be systematically eliminated from the atomic assertions in which they appear, and therefore every assertion is equivalent to an assertion not using any iota expression. $\Box$

In light of this theorem, perhaps there is little at stake in the dispute about whether to augment our language with iota expressions, since they add no formal expressive power.

Criticism of the strong semantics

I should like to make several criticisms of the strong semantics concerning how well it fulfills the goal of providing a logic of iota expressions based on Russell’s theory of definite descriptions.

Does not actually fulfill the Russellian theory. We were led to the strong semantics by Russell’s theory of definite descriptions, and many logicians take the strong semantics as a direct implementation of Russell’s theory. But is this right? To my way of thinking, at the very heart of Russell’s theory is the idea that an assertion involving reference by definite description carries an implicit commitment that those references are successful. Let us call this the implicit commitment to reference, and I should like to consider this idea on its own, apart from whatever Russell’s view might have been. My criticism here is that the strong semantics does not actually realize the implicit commitment to reference for all assertions.

It does fulfill the implicit commitment to reference for atomic assertions, to be sure, for we defined that an atomic assertion $At$ involving a definite description term $t$ is true if and only if the term $t$ successfully refers and the referent falls under the predicate $A$. The atomic truth conditions thus implement exactly what the Russellian implicit commitment to reference requires.

But when we extend the semantics through the Tarskian compositional recursion, however, we lose that feature. Namely, if an atomic assertion $At$ is false because the definite description term $t$ has failed to refer successfully, then the Tarskian recursion declares the biconditional $At\iff At$ to be true, the biconditional of two false assertions, and $\neg At$ and $At\implies At$ are similarly declared true in this case for the strong semantics. In all three cases, we have a sentence with a failed definite description and yet we have declared it true anyway, violating the implicit commitment to reference.

The tension between Russell and Tarski. The issue reveals an inherent tension between Russell and Tarski, a tension between a fully realized implicit commitment to reference and the compositional theory of truth. Specifically, the examples above show that if we follow the Russellian theory of definite descriptions for atomic assertions and then apply the Tarskian recursion, we will inevitably violate the implicit commitment to reference for some compound assertions. In other words, to require that every true assertion making reference by definite description commits to the success of those references simply does not obey the Tarski recursion. In short, the implicit commitment to reference is not compositional.

Does not respect stipulative definitions. The strong semantics does not respect stipulative definitions in the sense that the truth of an assertion is not always preserved when replacing a defined predicate by its definition.

Consider the ring of integers, for example, and the sentence “the largest prime number is odd.” We could formalize this as an atomic assertion $O(\iiota x\, Px)$, where $Ox$ is the oddness predicate expressing that $x$ is odd and $Px$ expresses that $x$ is a largest prime number. Since there is no largest prime number in the integers, the definite description $(\iiota x\,Px)$ has failed, and so on the strong semantics the sentence $O(\iiota x\, Px)$ is false in the integer ring.

But suppose that we had previously introduced the oddness predicate $Ox$ by stipulative definition over the ring of integers $\<\Z,+,\cdot,0,1>$, in the plain language of rings, defining oddness via $Ox\iff \neg\exists y\, (y+y=x)$. In the original language, therefore the assertion “the largest prime is odd” would be expressed as $\neg\exists y\, \bigl(y+y=(\iiota x\,Px)\bigr)$. Because the iota expression does not succeed, the atomic assertion $y+y=(\iiota x\,Px)$ is false for every particular value of $y$, and so the existential $\exists y$ is false, making the negation true. In this formulation, therefore, “the largest prime is odd” is true! So although the sentence was false in the definitional expansion using the atomic oddness predicate, it became true when we replaced that predicate by its definition, and so the strong semantics does not respect the replacement of a stipulatively defined predicate by its definition.

Some philosophical treatments of these kinds of cases focus on ambiguity and scope. One can introduce lambda expressions $\bigl(\lambda x\,\varphi(x)\bigr)$, for example, expressing property $\varphi$ in a manner to be treated as atomic for the purpose of applying the Russellian theory, so that $\bigl(\lambda x\,\varphi(x)\bigr)(t)$ expresses that $t$ is defined and $\varphi(t)$ holds. In simple cases, these lambda expressions amount to stipulative definitions, while in more sophisticated instances, subtle levels of dependence are expressed when $\varphi$ has other free variables not to be treated as atomic in this way by this expression. Nevertheless, the combined lambda/iota expressions are fully eliminable, as every assertion in that expanded language is equivalently expressible in the original language. To my way of thinking, the central problem here is not one of missing notation, precisely because we can already express the intended meaning without any lambda or iota expressions at all. Rather, the fundamental phenomenon is the fact that the strong semantics violates the expected logical principles for the treatment of stipulative definitions.

Does not respect logical equivalence. The strong semantics of iota expressions does not respect logical equivalence. In the integer ring $\<\Z,+,\cdot,0,1>$, there are two equivalent ways to express that a number $x$ is odd, namely,
$$\text{Odd}_0(x) = \neg\exists y\, (y+y=x)$$ $$\text{Odd}_1(x) = \exists y\, (y+y+1=x).$$
These two predicates are equivalent in the integers,
$$\<\Z,+,\cdot,0,1>\satisfies\forall x\, \bigl(\text{Odd}_0(x)\iff\text{Odd}_1(x)\bigr).$$
And yet, if we assert “the largest prime number is odd” using these two equivalent formulations, either as $\text{Odd}_0(\iiota x\,Px)$ or as $\text{Odd}_1(\iiota x\,Px)$, then we get opposite truth values. The first formulation is true, as we just previously observed above, but the second is false, because the atomic assertion $y+y+1=(\iiota x\,Px)$ is false for every particular $y$ and so the existential fails. So this is a case where the substitution instance of an iota expression into logically equivalent assertions yields different truth values.

Does not respect instantiation. The strong semantics for iota expressions does not respect universal instantiation. Every structure will declare some instances of $[\forall x\,\varphi(x)]\implies\varphi(t)$ false. Specifically, in any structure $\forall x\, (x=x)$ is true, but if $(\iiota x\,\varphi)$ is a failing iota expression, such as $(\iiota x\, x\neq x)$, then $(\iiota x\,\varphi)=(\iiota x\,\varphi)$ is false, because this is an atomic assertion with a failing definite description. So we have a false instantiation of a true universal.

The weak semantics for iota expressions

In light of these criticisms, let me describe an alternative semantics for the logic of definite descriptions, a more tentative and hesitant semantics, yet in many respects both reasonable and appealing. Namely, the weak semantics takes the line that in order for an assertion about an individual specified by definite description to be meaningful, the description must in fact succeed in its reference—otherwise it is not meaningful. On this account, for example, the sentence “the largest prime number is odd” is meaningless in the integer ring, without a truth value, and similarly with any further sentence built from this assertion. On the weak semantics, the assertion fails to express a proposition because there is no largest prime number in the integers.

On the weak semantics, we first make a judgement about whether an assertion is meaningful before stating whether it is true or not. As before, an iota expression $(\iiota x\,\varphi)$ is undefined in a model at a valuation if there is not a unique fulfilling instance (using the weak semantics recursively in this evaluation). In the atomic case of satisfaction $(Rt_1\cdots t_n)\valuation[\vec a]$, the weak semantics judges the assertion to be meaningful only if all of the term expressions $t_i^M\valuation[\vec a]$ are defined, and in this case the assertion is true or false accordingly as to whether $R^M(t_1^M\valuation[\vec a],\ldots,t_n^M\valuation[\vec a])$ holds. If one or more of the terms is not defined, then the assertion is judged meaningless. Similarly, in the disquotational recursion through the Boolean connectives, we say that $\varphi\wedge\psi$ is meaningful only when both conjuncts are meaningful, and true exactly when both are true. And similarly for $\varphi\vee\psi$, $\varphi\implies\psi$, $\neg\varphi$, and $\varphi\iff\psi$. In each case, with the weak semantics we require all of the constituent subformulas to be meaningful in order for the whole expression to be judged meaningful, whether or not the truth values of those assertions could affect the overall truth value.

The compositional theory of truth implicitly defines a well-founded relation on the instances of satisfaction $M\satisfies\varphi\valuation[\vec a]$, with each instance reducing via the Tarski recursion to certain relevant earlier instances. In the weak semantics, an assertion $M\satisfies\varphi\valuation[\vec a]$ is meaningful if and only if every iota term valuation arising hereditarily in the recursive calculation of the truth values is successfully defined.

The choice to use the weak semantics can be understood as a commitment to use robust definite descriptions that succeed in their reference. For meaningful assertions, one should ensure that all the relevant definite descriptions succeed, such as by using robust descriptions with default values in cases of failure, rather than relying on the semantical rules to paper over or fix up the effects of sloppy failed references. Nevertheless, the weak and the strong semantics agree on the truth value of any assertion found to be meaningful. In this sense, being true in the weak semantics is simply a tidier way to be true, one without sloppy failures of reference.

The weak semantics can be seen as a nonclassical logic in that not all instances of the law of excluded middle $\varphi\vee\neg\varphi$ will be declared true, since if $\varphi$ is not meaningful then neither will be this disjunction. But the logic is not intuitionist, since not all instances of $\varphi\implies\varphi$ are true. Meanwhile, the weak semantics are classical in the sense that in every meaningful instance, the law of excluded middle holds, as well as double-negation elimination.

The natural semantics for iota expressions

The natural semantics is a somewhat less hesitant semantics guided by the idea that an assertion with iota expressions or partially defined terms is meaningful when sufficiently many of those terms succeed in their reference to determine the truth value. In this semantics, we take a conjunction $\varphi\wedge\psi$ as meaningfully true, if both $\varphi$ and $\psi$ are meaningfully true, and meaningfully false if at least one of them is meaningfully false. A disjunction $\varphi\vee\psi$ is meaningfully true, if at least one of them is meaningfully true, and meaningfully false if both are meaningful and false. A negation $\neg\varphi$ is meaningful exactly when $\varphi$ is, and with the opposite truth value. In the natural semantics an implication $\varphi\implies\psi$ is meaningfully true if either $\varphi$ is meaningful and false or $\psi$ is meaningful and true, and $\varphi\implies\psi$ is meaningfully false if $\varphi$ is meaningfully true and $\psi$ is meaningfully false. This formulation of the semantics enables a robust treatment of conditional assertions, such as in the ordered real field where we might take division $y/x$ as a partial function, defined as long as the denominator is nonzero. In the natural semantics, the assertion
$$\forall x\, (0<x\implies 0<1/x)$$
comes out meaningful and true in the reals, whereas it is not meaningful in the weak semantics because $1/0$ is not defined, which might seem rather too weak because this case arises only when it is also ruled out by the antecedent. A biconditional $\varphi\iff\psi$ is meaningful if and only if both $\varphi$ and $\psi$ are meaningful, and it is true if these have the same truth value. In the natural semantics, an existential assertion $\exists x\ \varphi\valuation[\vec a]$ is judged meaningful and true if there is an instance $\varphi\valuation[b,\vec a]$ that is meaningful and true, and meaningfully false if every instance is meaningful but false. A universal assertion $\forall x\ \varphi\valuation[\vec a]$ is meaningfully true if every instance $\varphi\valuation[b,\vec a]$ is meaningful and true, and meaningfully false if at least one instance is meaningfully false.

Still no new expressive power

The weak semantics and the natural semantics both address some of the weird aspects of the strong semantics by addressing head-on and denying the claim that assertions made about nonexistent individuals are meaningful. This could be refreshing to someone put off by any sort of claim made about the king of France or the largest prime number in the integers—such a person might prefer to regard these claims as not meaningful. And yet, just as with the strong semantics, the weak semantics and the natural semantics offer no new expressive power to the logic.

Theorem. The language of first-order logic with iota expressions in either the weak semantics or the natural semantics has no new expressive power—for every assertion in the language with iota expressions, there are formulas in the original language expressing that the given formula is meaningful, respectively in the two semantics, and others expressing that it is meaningful and true.

Proof. This can be proved by induction on formulas similar to the proof of the no-new-expressive-power theorem for the strong semantics. The key point is that the question whether a given instance of iota expression $(\iiota x\,\varphi)$ succeeds in its reference or not is expressible by means of $\varphi$ without using any additional iota terms not already in $\varphi$. By augmenting any atomic assertion in which such an iota expression appears with the assertions that the references have succeeded, we thereby express the meaningfulness of the atomic expression. Similarly, the definition of what it means to be meaningful and what it means to be true in the weak semantics or in the natural semantics was itself defined recursively, and so in this way we can systematically eliminate the iota expressions and simultaneously create formulas asserting the meaningfulness and the truth of any given assertion in the expanded iota language. $\Box$

Let me next prove the senses in which both the weak and natural semantics survive the criticisms I had mentioned earlier for the strong semantics.


  1. Both the weak and natural semantics fulfill Russell’s theory of definite descriptions—if an assertion is true, then every definite description relevant for this was successful.
  2. Both the weak and natural semantics respect Tarski’s compositional theory of truth—if an assertion is meaningful, then its truth value is determined by the Tarski recursion.
  3. Both the weak and natural semantics respect stipulative definitions—replacing any defined predicate in a meaningful assertion by its definition, if meaningful, preserves the truth value.
  4. Both the weak and natural semantics respect logical equivalence—if $\varphi(x)$ and $\psi(x)$ are logically equivalent ($x$ occuring freely in both) and $t$ is any term, then $\varphi(t)$ and $\psi(t)$ get the same truth judgement.
  5. Both the weak and natural semantics respect universal instantiation—if $\forall x\varphi(x)$ and $\varphi(t)$ are both meaningful, then $[\forall x\varphi]\implies\varphi(t)$ is meaningful and true.

Proof. For statement (1), the notion of relevance here for the weak semantics is that of arising earlier in the well-founded recursive definition of truth, while in the natural semantics we are speaking of relevant instances sufficient for the truth calculation. In either semantics, for an assertion to be declared true, then all the definite descriptions relevant for this fact are successful. A truth judgement is never made on the basis of a failed reference.

Statement (2) is immediate, since both the weak and the natural semantics are defined in a compositional manner.

Statement (3) is proved by induction. If we have introduced predicate $Rx$ by stipulative definition $\rho(x)$, which is meaningful at every point of evaluation in the model, then whenever a term $t$ is defined, the predicate $Rt$ in the model will be meaningful and have the same truth value as $\rho(t)$. So replacing $R$ with $\rho$ in any formula will give the same truth judgement.

For statement (4), suppose that $\varphi(x)$ and $\psi(x)$ are logically equivalent, meaning that $x$ occurs freely in both assertions and the model gives the same truth judgement to them at every point. If $t$ is not defined, then both $\varphi(t)$ and $\psi(t)$ are declared meaningless (this is why we need the free variable actually to occur in the formulas). If $t$ is defined, then by the assumption of logical equivalence, $\varphi(t)$ and $\psi(t)$ will get the same truth value.

Statement (5) follows immediately for the weak semantics, since if $\forall x\,\varphi(x)$ and $\varphi(t)$ are both meaningful, then in particular, if $x$ actually occurs freely in $\varphi$ then $t$ must be defined in the weak semantics, in which case $\varphi(t)$ will follow from $\forall x\,\varphi(x)$. In the natural semantics, if $\forall x\,\varphi(x)$ is meaningfully true, then $\varphi(t)$ also will be, and if it is meaningfully false, then the implication $[\forall x\,\varphi]\implies\varphi(t)$ is meaningfully true. $\Box$

Deflationary philosophical conclusions

To my way of thinking, the principal philosophical conclusion to make in light of the no-new-expressive-power theorems is that there is nothing at stake in the debate about whether to add iota expressions to the logic or whether to use the strong or weak semantics. The debate is moot, and we can adopt a deflationary stance, because any proposition or idea that we might wish to express with iota expressions and definite descriptions in one or the other semantics can be expressed in the original language. The expansion of linguistic resources provided by iota expressions is ultimately a matter of mere convenience or logical aesthetics whether we are to use them or not. If the logical feature or idea you wish to convey is more clearly or easily conveyed with iota expressions, then go to town! But in principle, they are eliminable.

Similarly, there is little at stake in the dispute between the weak and the strong semantics. In fact they agree on the truth judgements of all meaningful assertions. In light of this, there seems little reason not to proceed with the strong semantics, since it provides coherent truth values to the assertions lacking a truth value in the weak semantics. The criticisms I had mentioned may be outweighed simply by having a complete assignment of truth values. The question of whether assertions made about failed definite descriptions are actually meaningful can be answered by the reply: they are meaningful, because the strong semantics provide the meaning. But again, because every idea that can be expressed in these semantics can also be expressed without it, there is nothing at stake in this decision.

This blog post is adapted from my book-in-progress, Topics in Logic, an introduction to a wide selection of topics in logic for philosophers, mathematicians, and computer scientists.

Checkmate is not the same as a forced capture of the enemy king in simplified chess

In my imagination, and perhaps also in historical reality, our current standard rules of chess evolved from a simpler era with a simpler set of rules for the game. Let us call it simplified chess. In simplified chess there was the same 8×8 board with the same pieces as now moving under the same movement rules. But the winning aim was different, and simpler. Namely, the winning goal of simplified chess was simply to capture the enemy king. You won the game by capturing the opposing king, just as you would capture any other piece.

There was therefore no need to mention check or checkmate in the rules. Rather, these words described important situations that might arise during game play. Specifically, you have placed your opponent in check, if you were in a position to capture their king on the next move, unless they did something to prevent it. You placed your opponent in checkmate, if there was indeed nothing that they could do to prevent it.

In particular, in simplified chess there was no rule against leaving your king in check or even moving your king into check. Rather, this was simply an inadvisable move, because your opponent could take it and thereby win. Checkmate was merely the situation that occurred when all your moves were like this.

It is interesting to notice that it is common practice in blitz chess and bullet chess to play with something closer to simplified chess rules—it is often considered winning simply to capture the opposing king, even if there was no checkmate. This is also how the chess variant bughouse is usually played, even in official tournaments. To my way of thinking, there is a certain attractive simplicity to the rules of simplified chess. The modern chess rules might seem to be ridiculous for needlessly codifying into the rules a matter that could simply be played out on the board by actually capturing the king.

Part of what I imagine is that our contemporary rules could easily have evolved from simplified chess from a practice of polite game play. In order to avoid the humiliation of actually capturing and removing the opponent’s king and replacing it with one’s own piece, which indeed might even have been a lowly pawn, a custom developed to declare the game over when this was a foregone conclusion. In other words, I find it very reasonable to suppose that the winning checkmate rule simply arose from a simplified rule set by common practice of respectful game play.

I am not a chess historian, and I don’t really know if chess did indeed evolve from a simpler version of the game like this, but it strikes me as very likely that something like this must have happened. I await comment from the chess historians. Let me add though that I would also find it reasonable to expect that simplified chess might also have had no provision for opening pawns moving two squares instead of just one. Such a rule could arise naturally as an agreed upon compromise to quicken the game and get to the more interesting positions more quickly. But once that rule was adopted, then the en passant rule is a natural corrective to prevent abuse. I speculate that castling may have arisen similarly—perhaps many players in a community customarily took several moves, perhaps in a standard manuver sequence, to accomplish the effect of hiding their kings away toward the corner and also bringing their rooks to the center files; the practice could have simply been codified into a one-move practice.

My main point in this post, however, does not concern these other rules, but rather only the checkmate winning condition rule and to a certain logic-of-chess issue it brings to light.

When teaching chess to beginners, it is common to describe the checkmate winning situation in terms of the necessary possibility of capturing the king. One might say that a checkmate situation means the king is attacked, and there is nothing the opponent can do to prevent the king from being captured.

This explanation suggests a general claim in the logic of chess: a position is in checkmate (in the contemporary rules) if and only if the winning player can necessarily capture the opposing king on the next move (in simplified chess).

This claim is mostly true. In most instances, when you have placed your opponent in checkmate, then you would be able to capture their king on your next move in simplified chess, since all their moves would leave the king in check, and so you could take it straight away.

But I would like to point out something I found interesting about this checkmate logic claim. Namely, it isn’t true in all cases. There is a position, I claim, that is checkmate in the modern rules, but in simplified chess, the winning player would not be able to capture the enemy king on the next move.

My example proceeds from the following (rather silly) position, with black to move. (White pawns move upward.)

Of course, Black should play the winning move: knight to C7, as shown here:

This places the white king in check, and there is nothing to be done about it, and so it is checkmate. Black has won, according to the usual chess rules of today.

But let us consider the position under the rules of simplified chess. Will Black be able to capture the white king? Well, after Black’s nC7, it is now White’s turn. Remember that in simplified chess, it is allowed (though inadvisable) to leave one’s king in check at the end of turn or even to move the king into check. But the trouble with this position is not that White can somehow move to avoid check, but rather simply that White has no moves at all. There are no White moves, not even moves that leave White in check. But therefore, in simplified chess, this position is a stalemate, rather than a Black win. In particular, Black will not actually be able to capture the White king, since White has no moves, and so the game will not proceed to that juncture.

Conclusion: checkmate in contemporary chess is not always the same thing as being necessarily able to capture the opposing king on the next move in simplified chess.

Of course, perhaps in simplified chess, one wouldn’t regard stalemate as a draw, but as a win for the player who placed the opponent in stalemate. That would be fine by me (and it would align with the rules currently used in draughts, where one loses when there is no move available), but it doesn’t negate my point. The position above would still be a Black win under that rule, sure, but still Black wouldn’t be able to capture the White king. That is, my claim would stand that checkmate (in modern rules) is not the same thing as the necessary possibility to capture the opposing king.

I had made a Tweet earlier today about this idea (below), but opted for a fuller explanation in this blog post to explain the idea more carefully.

On Twitter, user Gro-Tsen pointed out a similar situation arises with stalemates. Namely, consider the following position, with White to play:

Perhaps Black had just played qB5, which perhaps was a blunder, since now the White king sits in stalemate, with no move. So in the usual chess rules, this is a drawn position.

But in simplified chess, according to the rules I have described, the White king is not yet explicitly captured, and so he is free still to move around, to A6, A8, B6, or B7 (moving into check), or also capturing the black rook at B8. But in any case, whichever move White makes, Black will be able to capture the White king on the next move. So in the simplified chess rules, this position is White-to-win-in-1, and not a draw.

Furthermore, this position therefore shows once again that checkmate (in normal rules) is not the same as the necessary possibility to capture the king (in simplified rules), since this is a position where Black has a necessary possibility to capture the White king on the next move in simplified chess, but it is not checkmate in ordinary chess, being rather stalemate.

;TLDR Ultimately, my view is that our current rules of chess likely evolved from simplified rules and the idea that checkmate is what occurs when you have a necessary possibility of capturing the enemy king on the next move in those rules. But nevertheless, example chess positions show that these two notions are not quite exactly the same.