I’d like to share a simple proof I’ve discovered recently of a surprising fact: there is a universal algorithm, capable of computing any given function!

Wait, what? What on earth do I mean? Can’t we prove that some functions are not computable? Yes, of course.

What I mean is that there is a universal algorithm, a Turing machine program capable of computing any desired function, if only one should run the program in the right universe. There is a Turing machine program $p$ with the property that for any function $f:\newcommand\N{\mathbb{N}}\N\to\N$ on the natural numbers, including non-computable functions, there is a model of arithmetic or set theory inside of which the function computed by $p$ agrees exactly with $f$ on all standard finite input. You have to run the program in a different universe in order that it will compute your desired function $f$.

$\newcommand\ZFC{\text{ZFC}}

\newcommand\PA{\text{PA}}

\newcommand\Con{\mathop{\text{Con}}}

\newcommand\proves{\vdash}

\newcommand{\concat}{\mathbin{{}^\smallfrown}}

\newcommand\restrict{\upharpoonright}

$

**Theorem** There is a Turing machine program $p$, carrying out the specific algorithm described in the proof, such that for any function $f:\N\to\N$, there is a model of arithmetic $M\models\PA$, or indeed a model of set theory $M\models\ZFC$ or more (if consistent), such that the function computed by program $p$ inside $M$ agrees exactly with $f$ on all standard finite input.

The proof is elementary, relying essentially only on the ideas of the classical proof of the Gödel-Rosser theorem. To briefly review, for any computably axiomatized theory $T$ extending $\PA$, there is a corresponding sentence $\rho$, called the *Rosser* sentence, which asserts, “for any proof of $\rho$ in $T$, there is a smaller proof of $\neg\rho$.” That is, by smaller, I mean that the Gödel-code of the proof is smaller. One constructs the sentence $\rho$ by a simple application of the Gödel fixed-point lemma, just as one constructs the usual Gödel sentence that asserts its own non-provability. The basic classical facts concerning the Rosser sentence include the following:

- If $T$ is consistent, then so are both $T+\rho$ and $T+\neg\rho$
- $\PA+\Con(T)$ proves $\rho$.
- The theories $T$, $T+\rho$ and $T+\neg\rho$ are equiconsistent.
- If $T$ is consistent, then $T+\rho$ does not prove $\Con(T)$.

The first statement is the essential assertion of the Gödel-Rosser theorem, and it is easy to prove: if $T$ is consistent and $T\proves\rho$, then the proof would be finite in the meta-theory, and so since $T$ would have to prove that there is a smaller proof of $\neg\rho$, that proof would also be finite in the meta-theory and hence an actual proof, contradicting the consistency of $T$. Similarly, if $T\proves\neg\rho$, then the proof would be finite in the meta-theory, and so $T$ would be able to verify that $\rho$ is true, and so $T\proves\rho$, again contradicting consistency. By internalizing the previous arguments to PA, we see that $\PA+\Con(T)$ will prove that neither $\rho$ nor $\neg\rho$ are provable in $T$, making $\rho$ vacuously true in this case and also establishing $\Con(T+\rho)$ and $\Con(T+\neg\rho)$, for the second and third statements. In particular, $T+\Con(T)\proves\Con(T+\rho)$, which implies that $T+\rho$ does not prove $\Con(T)$ by the incompleteness theorem applied to the theory $T+\rho$, for the fourth statement.

Let’s now proceed to the proof of the theorem. To begin, we construct what I call the *Rosser tree* over a c.e. theory $T$. Namely, we recursively define theories $R_s$ for each finite binary string $s\in 2^{{<}\omega}$, placing the initial theory $R_{\emptyset}=T$ at the root, and then recursively adding either the Rosser sentence $\rho_s$ for the theory $R_s$ or its negation $\neg\rho_s$ at each stage to form the theories at the next level of the tree.

$$R_{s\concat 1}=R_s+\rho_s$$

$$R_{s\concat 0}=R_s+\neg\rho_s$$

Each theory $R_s$ is therefore a finite extension of $T$ by successively adding the appropriate Rosser sentences or their negations in the pattern described by $s$. If the initial theory $T$ is consistent, then it follows by induction using the Gödel-Rosser theorem that all the theories $R_s$ in the Rosser tree are consistent. Extending our notation to the branches through the tree, if $f\in{}^\omega 2$ is an infinite binary sequence, we let $R_f=\bigcup_n R_{f\upharpoonright n}$ be the union of the theories arising along that branch of the Rosser tree. In this way, we have constructed a perfect set of continuum many distinct consistent theories.

I shall now describe a universal algorithm for the case of computing binary functions. Consider the Rosser tree over the theory $T=\PA+\neg\Con(\PA)$. This is a consistent theory that happens to prove its own inconsistency. By considering the Gödel-codes in order, the algorithm should begin by searching for a proof of the Rosser sentence $\rho_{\emptyset}$ or its negation in the initial theory $R_{\emptyset}$. If such a proof is ever found, then the algorithm outputs $0$ or $1$ on input $0$, respectively, depending on whether it was the Rosser sentence or its negation that was found first, and moves to the next theory in the Rosser tree by adding the opposite statement to the current theory. Then, it starts searching for a proof of the Rosser sentence of *that* theory or its negation. At each stage in the algorithm, there is a current theory $R_s$, depending on which prior proofs have been found, and the algorithm searches for a proof of $\rho_s$ or $\neg\rho_s$. If found, it outputs $0$ or $1$ accordingly (on input $n=|s|$), and moves to the next theory in the Rosser tree by adding the opposite statement to the current theory.

If $f:\N\to 2=\{0,1\}$ is any binary function on the natural numbers, then let $R_f$ be the theory arising from the corresponding path through the Rosser tree, and let $M\models R_f$ be a model of this theory. I claim that the universal algorithm I just described will compute exactly $f(n)$ on input $n$ inside this model. The thing to notice is that because $\neg\Con(\PA)$ was part of the initial theory, the model $M$ will think that all the theories in the Rosser tree are inconsistent. So the model will have plenty of proofs of every statement and its negation for any theory in the Rosser tree, and so in particular, the function computed by $p$ in $M$ will be a total function. The question is which proofs will come first at each stage, affecting the values of the function. Let $s=f\restrict n$ and notice that $R_s$ is true in $M$. Suppose inductively that the function computed by $p$ has worked correctly below $n$ in $M$, and consider stage $n$ of the procedure. By induction, the current theory will be exactly $R_s$, and the algorithm will be searching for a proof of $\rho_s$ or its negation in $R_s$. Notice that $f(n)=1$ just in case $\rho_s$ is true in $M$, and because of what $\rho_s$ asserts and the fact that $M$ thinks it is provable in $R_s$, it must be that there is a smaller proof of $\neg\rho_s$. So in this case, the algorithm will find the proof of $\neg\rho_s$ first, and therefore, according to the precise instructions of the algorithm, it will output $1$ on input $n$ and add $\rho_s$ (the opposite statement) to the current theory, moving to the theory $R_{s\concat 1}$ in the Rosser tree. Similarly, if $f(n)=0$, then $\neg\rho_s$ will be true in $M$, and the algorithm will therefore first find a proof of $\rho_s$, give output $0$ and add $\neg\rho_s$ to the current theory, moving to $R_{s\concat 0}$. In this way, the algorithm finds the proofs in exactly the right way so as to have $R_{f\restrict n}$ as the current theory at stage $n$ and thereby compute exactly the function $f$, as desired.

Basically, the theory $R_f$ asserts exactly that the proofs will be found in the right order in such a way that program $p$ will exactly compute $f$ on all standard finite input. So every binary function $f$ is computed by the algorithm in any model of the theory $R_f$.

Let me now explain how to extend the result to handle all functions $g:\N\to\N$, rather than only the binary functions as above. The idea is simply to modify the binary universal algorithm in a simple way. Any function $g:\N\to \N$ can be coded with a binary function $f:\N\to 2$ in a canonical way, for example, by having successive blocks of $1$s in $f$, separated by $0$s, with the $n^{\rm th}$ block of size $g(n)$. Let $q$ be the algorithm that runs the binary universal algorithm described above, thereby computing a binary sequence, and then extract from that binary sequence a corresponding function from $\N$ to $\N$ (this may fail, if for example, the binary sequence is finite or if it has only finitely many $0$s). Nevertheless, for any function $g:\N\to \N$ there is a binary function $f:\N\to 2$ coding it in the way we have described, and in any model $M\models R_f$, the binary universal algorithm will compute $f$, causing this adapted algorithm to compute exactly $g$ on all standard finite input, as desired.

Finally, let me describe how to extend the result to work with models of set theory, rather than models of arithmetic. Suppose that $\ZFC^+$ is a consistent c.e. extension of ZFC; perhaps it is ZFC itself, or ZFC plus some large cardinal axioms. Let $T=\ZFC^++\neg\Con(\ZFC^+)$ be a slightly stronger theory, which is also consistent, by the incompleteness theorem. Since $T$ interprets arithmetic, the theory of Rosser sentences applies, and so we may build the corresponding Rosser tree over $T$, and also we may undertake the binary universal algorithm using $T$ as the initial theory. If $f:\N\to 2$ is any binary function, then let $R_f$ be the theory arising on the corresponding branch through the Rosser tree, and suppose $M\models R_f$. This is a model of $\ZFC^+$, which also thinks that $\ZFC^+$ is inconsistent. So again, the universal algorithm will find plenty of proofs in this model, and as before, it will find the proofs in just the right order that the binary universal algorithm will compute exactly the function $f$. From this binary universal algorithm, one may again design an algorithm universal for all functions $g:\N\to\N$, as desired.

One can also get another kind of universality. Namely, there is a program $r$, such that for any finite $s\subset\N$, there is a model $M$ of $\PA$ (or $\ZFC$, etc.) such that inside the model $M$, the program $r$ will enumerate the set $s$ and nothing more. One can obtain such a program $r$ from the program $p$ of the theorem: just let $r$ run the universal binary program $p$ until a double $0$ is produced, and then interprets the finite binary string up to that point as the set $s$ to output.

Let me now also discuss another form of universality.

**Corollary**

There is a program $p$, such that for any model $M\models\PA+\Con(\PA)$ and any function $f:M\to M$ that is definable in $M$, there is an end-extension of $M$ to a taller model $N\models\PA$ such that in $N$, the function computed by program $p$ agrees exactly with $f$ on input in $M$.

**Proof**

We simply apply the main theorem inside $M$. The point is that if $M$ thinks $\Con(\PA)$, then it can build what it thinks is the tree of Rosser extensions, and it will think that each step maintains consistency. So the theory $T_f$ that it constructs will be consistent in $M$ and therefore have a model (the Henkin model) definable in $M$, which will therefore be an end-extension of $M$.

**QED**

This last application has a clear affinity with a theorem of Woodin’s, recently extended by Rasmus Blanck and Ali Enayat. See Victoria Gitman’s posts about her seminar talk on those theorems: Computable processes can produce arbitrary outputs in nonstandard models, continuation.

**Alternative proof**. Here is an alternative elegant proof of the theorem based on the comments below of Vadim Kosoy. Let $T$ be any consistent computably axiomatizable theory interpreting PA, such as PA itself or ZFC or what have you. For any Turing machine program $e$, let $q(e)$ be a program carrying out the following procedure: on input $n$, search systematically for a finite function $h:X\to\mathbb{N}$, with $X$ finite and $n\in X$, and for a proof of the statement “program $p$ does not agree with $h$ on all inputs in $X$,” using the function $h$ simply as a list of values for this assertion. For the first such function and proof that is found, if any, give as output the value $h(n)$.

Since the function $e\mapsto q(e)$ is computable, there is by Kleene’s recursion theorem a program $p$ for which $p$ and $f(p)$ compute the same function, and furthermore, $T$ proves this. So the program $p$ is searching for proofs that $p$ itself does not behave in a certain way, and then it is behaving in that way when such a proof is found.

I claim that the theory $T$ does not actually prove any of those statements, “program $p$ does not agree with $h$ on inputs in $X$,” for any particular finite function $h:X\to\mathbb{N}$. If it did prove such a statement, then for the smallest such function and proof, the output of $p$ would indeed be $h$ on all inputs in $X$, by design. Thus, there would also be a proof that the program *did* agree with this particular $h$, and so $T$ would prove a contradiction, contrary to our assumption that it was consistent. So $T$ actually proves none of those statements. In particular, the program $p$ computes the empty function in the standard model of arithmetic. But also, for any particular finite function $h:X\to\mathbb{N}$, we may consistently add the assertion “program $p$ agrees with $h$ on inputs in $X$” to $T$, since $T$ did not refute this assertion.

For any function $f:\mathbb{N}\to\mathbb{N}$, let $T_f$ be the theory $T$ together with all assertions of the form “program $p$ halts on input $n$ with value $k$”, for the particular value $k=f(n)$. I claim that this theory is consistent, for if it is not, then by compactness there would be finitely many of the assertions that enable the inconsistency, and so there would be a finite function $h:X\to\mathbb{N}$, with $h=f\upharpoonright X$, such that $T$ proved the program $p$ does not agree with $h$ on inputs in $X$. But in the previous paragraph, we proved that this doesn’t happen. And so the theory $T_f$ is consistent.

Finally, note that in any model of $T_f$, the program $p$ computes the function $f$ on standard input, because these assertions are exactly made in the theory. **QED**

I’m just starting to read this, but it looks beautiful!!!

One typo so far:

“both T+ρ an T+¬ρ”.

Thanks for the compliment! I’m glad you like it.

(And I’ve now fixed that typo you mentioned.)

Thank you, John, for your excellent post on this phenomenon on your blog—It’s great! https://johncarlosbaez.wordpress.com/2016/04/02/computing-the-uncomputable/.

There is a comment stream on Reddit: https://www.reddit.com/r/math/comments/4c5ikz/every_function_can_be_computable_if_you_run_this/

Pingback: Computing the Uncomputable | Azimuth

There is an old classical result that if I recall has been published several times, asserting that there is an effectively given infinite sequence of Pi-0-1 sentences such that no nontrivial propositional combination of them can be proved in PA. A trivial reformulation of this classical result is the following. There is a Turing machine computing a partial recursive function f (actually the empty function in reality) such that for any list of requirements f(0) is defined (undefined), f(1) is defined (undefined), f(2) is defined (undefined), …, there is a model of PA meeting all those requirements. (I.e., there are continually many such requirements, and they can all be met in a single model of PA using the compactness theorem from predicate logic).

Now consider the following partial recursive function g. To compute g(i), look for the least j such that you find, when computing f, that f at (p_i)^(j+1) has a value. And output j. Here p_i is the (i+1)-st prime number and ^ is exponentiation.

Then in analogy with that property of f, we see that for any list of requirements g(0) = n_0, g(1) = n_1, g(2) = n_2, …, there is a model of PA meeting all those requirements.

Harvey Friedman

Thanks for your comment! I agree with all that, and yes, my argument with the Rosser tree amounts to a proof of that claim you mention. The $n^{\rm th}$ $\Pi^0_1$ sentence is: at stage $n$ of the Rosser tree, the proof of the Rosser sentence arising at level $n$ of the tree appears before any proof of its negation. These are independent precisely because every branch through the Rosser tree is a consistent theory.

The result proved by Joel, as Harvey points out, is an immediate consequence of putting the compactness theorem for first order logic together with the existence of an effectively given infinite sequence of Pi-0-1 sentences such that no nontrivial propositional combination of them can be proved in PA; the latter result was proved in the early 1960s, independently by Mostowski and Kripke. Rasmus Blanck has recently written a survey paper on this topic.

However, it should be pointed out that Joel’s proof of the existence of the aforementioned effective sequence of Pi-0-1 sentences is new and perspicuous.

I wonder if credit shouldn’t go even further back? After all, surely Rosser must have observed that one can iterate his theorem and thereby build an entire tree of logically independent $\Pi^0_1$ assertions, what I called the Rosser tree. That would push the result to the 1930s.

Well, I was referring to the *statement* of the theorem itself that you have established rather than the method of proof when I referred to the the work of Mostowski and Kripke in the 1960s.

Mostowski’s proof is nowhere as clear as yours, but it does use a Rosser-style argument, His result is a bit stronger since he produces an independent sequence over an effectively given sequence of consistent theories rather than a single theory. Kripke’s proof, on the other hand, uses recursion-theoretic ideas (essentially the second recursion theorem).

One more point: I agree that Rosser’s theorem immediately proves the existence of continuum many consistent extensions of any sufficiently strong computably axiomatizable theory such as PA, and that this must have been noticed after Rosser’s proof came out in 1936, but more argumentation is needed to show that there is a single unary Pi-0-1 formula such that no nontrivial propositional combination of its numerical instances can be proved in PA; it is precisely this fact that was proved by Mostowski in 1961, and which you have found a transparent proof for. However, the *model theoretic* corollaries that you draw from the existence of such a formula are immediate and well-known.

A very nice theorem. There is something I cannot understand; The Church-Turing thesis if understood correctly says Every natural concept of computability computes exactly those functions which are computable in Turing machine.

The aforementioned theorem yields different concepts of computability that each one computes different functions. The people living in $\maythcal{M}$ compute different functions that people in $\mathcal{N}$ computes.

Does it mean theorem produce infinitely many concept of computability using $p$ and $\mathcal{M}$.

I guess that the models of $PA$ which theorem guarantee cannot be computable, I mean we cannot make two mathematical concept of computability to violet the Church-Turing thesis, but on the other hand the thesis is not a mathematical statement.

How we know that the concept of computability is fixed along the time?

Maybe 1000000 years later mathematicians live in a specific model.

I am thankful if you clarify my mistake.

A very nice theorem. There is something I cannot understand; The Church-Turing thesis if understood correctly says Every natural concept of computability computes exactly those functions which are computable in Turing machine.

I think the aforementioned theorem yields different concepts of computability that each one computes different functions. The people living in $\maythcal{M}$ compute different functions that people in $\mathcal{N}$ computes.

Does it mean theorem produce infinitely many concept of computability using $p$ and $\mathcal{M}$.

I guess that the models of $PA$ which theorem guarantee cannot be computable, I mean we cannot make two mathematical concept of computability to violet the Church-Turing thesis, but on the other hand the thesis is not a mathematical statement.

How we know that the concept of computability is fixed along the time?

Maybe 10000000000000 years later mathematicians live in a specific model.

I am thankful if you clarify my mistake.

It seems to me that the following proof is a bit simpler. Please tell me if I have a mistake somewhere.

Let’s fix any base theory $T$ which is powerful enough to define natural numbers and computations (e.g. it can be $PA$ or $ZFC$). Let’s also fix a way to encode proofs in $T$ as natural numbers. Our program $p$ uses quining to search for proofs of sentences of the form “On the finite set $X \subseteq \mathbb{N}$, $p$ doesn’t realize the function $h: X \rightarrow \mathbb{N}$”. Here $X$ is any finite set given explicitly by listing the binary notations of its elements and $h$ is any function given explicitly as a look-up table. “Doesn’t realize” means that for some $k \in X$, $p$ either fails to halt or produces an answer different than $h(k)$. $p$ running on input $n$ iterates over all numbers in its search for proofs. Once it encounters a number which is a valid proof of the form above, it checks whether $n \in X$. If $n \in X$, it outputs $h(n)$ and terminates. Otherwise, it keeps running.

Assuming $T$ is consistent, for any $X$ and $h: X \rightarrow \mathbb{N}$ it is consistent to add the axiom “p realizes $h$” to $T$. Therefore for any function $f$ we can consistently add the infinite set of axioms “p realizes $f$ restricted to $X$” for all finite $X$. Therefore, there is a model in which these axioms hold and $p$ realizes $f$ on the whole of $\mathbb{N}$.

Denote “$p$ realizes $h$” by $\phi_{p,h}$. If $\phi_{p,h}$ is inconsistent with $T$ then $T \vdash \neg \phi_{p,h}$ (I am *do* assume proof by contradiction is allowed in $T$). Assume that there is at least one $h$ s.t. $T \vdash \neg \phi_{p,h}$. Let $h^*$ be s.t. that among all such $h$, the proof of $\neg \phi_{p,h^*}$ is represented by the lowest number. When running $p$ on any $n$ in the domain of $h^*$ this proof will be encountered first hence $p$ will produce the output $h^*(n)$. Therefore $p$ realizes $h^*$. Since any finite computation can be transformed into a proof, it follows that $T \vdash \phi_{p,h^*}$. We got that $T$ is inconsistent contrary to our assumption.

The reason we cannot do it twice in incompatible ways is that the construction of $p$ depends on $T$. Suppose $h$ and $h’$ are incompatible. Denoting $T’ := T + \phi_{p,h}$ we get a new program $p’$ associated with $T’$. Thus $\phi_{p,h’}$ is inconsistent with $T’$ but $\phi_{p’,h’}$ is consistent with $T’$.

On the other hand, if $h$ and $h’$ are compatible then we can construction a function with finite domain $h”$ s.t. $h$ and $h’$ are both its restrictions. $\phi_{p,h”}$ is consistent with $T$ and implies $\phi_{p,h}$ and $\phi_{p,h’}$ thus both of the latter sentences can be added as axioms together.

I like your proof a lot—I find it quite elegant! I have edited my post to include a version of it at the end.

Pingback: A program that accepts any desired finite set, in the right universe | Joel David Hamkins

Pingback: The universal algorithm: a new simple proof of Woodin’s theorem | Joel David Hamkins

Pingback: The universal definition — it can define any mathematical object you like, in the right set-theoretic universe | Joel David Hamkins

Pingback: Dialectical Programming – Codical Theory