A lifetime of hot air

We’ve been making some Fermi estimations in the Math for Liberal Arts class I am teaching, and today we discuss the following question:

Question. If you collect all the hot-air that you have breathed in your life, what would the volume be? If you made a hot-air balloon, would it be able to lift you and all your possessions?

To answer, let’s start with the first part. How much do I breathe? If I imagine inhaling and then exhaling a deep, big breath, I figure I could inflate a small paper bag, perhaps well over one liter, but probably not as much as two liters. But my passive resting breathing is probably much less than a big deep breath. So let’s figure a half liter per ordinary passive breath. How often do I breathe? Well, in the swimming pool, I can hold my breath under water for a minute or even two minutes (in my younger swim-team days); but if I hold my breath right now, I can say that it does start to feel a little unnatural, like I should take my next breath, even after just about five or ten seconds, even though this impulse could be resisted longer. It seems to me that my body wants to take another breath in about that time. If we breathe every five seconds, that would mean 12 breaths per minute, so let’s say ten breaths per minute, which would mean a volume of 5 liters per minute. That makes $5\times 60=300$ liters per hour, or $300*24=7200$ liters per day. In a year, this would be $7200\times 365$, which is less than 7000 times 400, which is 2,800,000 liters per year. Let’s say 2.5 million liters per year of hot air. Times 50 years would make $125$ million liters of hot air in all.

Now, each liter of air fills a cube 10 cm on a side, and one thousand of these fit into a cubic meter. So we’ve got 125,000 cubic meters of hot air. This is the same as a cube 50 meters by 50 meters by 50 meters. That is my hot-air balloon! Filled with a lifetime of hot air. (This is considerably larger, more than one hundred times as large, as a typical recreational hot-air balloon, which I understand are usually under 1000 cubic meters. From this point of view, it would seem likely that it could lift me and all my possessions, although body temperature may be much less than is achieved in those balloons.)

If the air was at my body temperature ($98.6^\circ$ F), then would it be able to lift me and all my possessions? Well, let’s see how much it would lift. Hot air expands in proportion to temperature (from absolute zero). If it is a day like today, about $50^\circ$ degrees F, which is about a 50 degree F difference, and absolute zero is minus 460 F. So this is about a 10 per cent increase in temperature. (In metric: we have body temperature of about 37 degrees, and it is about 10 degrees Celsius today, so a difference of 27 degrees, and absolute zero is minus 270, so about ten per cent increase, as I had said.)

So the heat of the hot air caused it to expand in volume by ten percent. The buoyant force of the hot-air balloon is exactly the weight of this displaced air, by the Archimedean principle. Thus, the lifting force of my hot-air balloon will be equal to the mass of air filling ten percent of the volume we computed. How much does air weigh? I happen to remember from my high school science class that one mole of air at one atmosphere of pressure is about 22 liters (my teacher had a cube of exactly that size sitting on his desk, to help us to visualize it). And I also know that air is mainly nitrogen, which forms the molecule $N_2$, and since nitrogen in the periodic table has an atomic number of 14, the molecule $N_2$ has a mass of 28 grams per mole. So air weighs about 28 grams per 22 liters, which is about 1.3 grams per liter. Each cubic meter is one thousand liters, and so 1.3 kilograms per cubic meter (this is much larger than I had expected—air weighs more than one kilogram per cubic meter!). My hot air in total was 125,000 cubic meters, and we said that because of the temperature difference, the volume expanded by ten percent, or 12,500 cubic meters. This expansion would displace an equal volume of air, which weighs 1.3 kg per cubic meter. Thus, the displaced air weighs $12,500\times 1.3\approx 16,000$ kilograms, or about 16 metric tons. So all my hot air, at body temperature in a giant hot air balloon on a chilly day, would have a lifting force able to lift 16 metric tons.

Would this lift me and all my possessions? Do I own 16 tons of stuff? Well, thankfully, I don’t own a car, which would be a ton or more by itself. But I do own a lot of books, a piano, an oven, a dishwasher, some heavy furniture, paintings, and various other items, as well as a collection of large potted plants on my terrace. It seems likely to me that I could fit most if not all my possessions within 16 tons. To gain a little confidence in this, let’s estimate the mass of my books. My wife and I have about twelve large shelves filled with books, each about 2 meters, and then I have about 3 more such shelves filled with books in my offices at the university. If we count half of the home books as mine, plus my office books, that makes 9 shelves times two meters, for about 20 meters of books. If the books are about 25 cm tall on average, and 15 cm deep, that makes $20\times .25\times .15=.75$ cubic meters of books. Let’s round up to one cubic meter of books. How much does a cubic meter of paper weigh? Well, one ream of copy paper weighs about 2 kilograms, and that is a volume of 8.5 by 11 by 2 inches. One meter is about 33 inches, and so we could fill one cubic meter with a pile of reams of copy paper 3 by 3 by 17, which would be 163 reams, or about 300 kilograms. So not even a half ton of books! So I can definitely lift all my most important possessions within the 16 tons.

Final answer: Yes, if we filled a giant balloon with all the hot-air I have breathed in my life, at body temperature, then it would lift me and all my possessions.

On number sense — Would one day of NYC coffee fill the Statue of Liberty? And other fun questions…

Today I gave a lecture on what I call number sense, using a process of estimation and approximation in order to calculate various unknown quantities, including a few fantastical ones. How much coffee is made per day in New York City? Would it fill up the Statue of Liberty? Approximately how many babies are born in New York City each day? If you made a stack of quarters to reach the distance to the moon, what would the dollar-value be? If you piled those quarters in a heap, would it fit in Central Park? How much does the Empire State Building weigh?

These kinds of back-of-the-envelope calculations, in my view, have at their heart the idea that one can solve a difficult or seemingly impossible problem by breaking it into more manageable pieces. We don’t just pull a final answer out of the air, but rather make simplifying assumptions and informed estimates for related quantities that we are more familiar with or have knowledge about, and then use that information to derive a better estimate for the main quantity. For most of these questions, at the outset we may have little idea what would be a reasonable answer, but by the end, we gain some insight and find ourselves a little closer to the truth.

These kind of calculations are also known as Fermi estimations, in light of Fermi’s remarkable ability to make surprisingly accurate estimations on the basis of little or no hard data. The wikipedia page (thanks to Artie Prendergrast-Smith for mentioning this link in the comments) emphasizes that even in a case where the estimate is significantly off the true value, nevertheless we may still find value in the Fermi calculation, because it focusses our attention to the reasons for the divergence. In discovering which of the assumptions underlying our calculations was wrong, we come to a deeper understanding of the true situation.

In the lecture, I began with some very easy cases. For example, how many seats were in the auditorium? The students estimated that there were approximately 12 seats per row and about 10 rows, so 120 seats in all. How old was one of the students, in seconds? Well, he was 18 years old, and so we could simply multiply each year by 365 days, times 24 hours per day, times 60 minutes per hour, times 60 seconds per minute, to get
$$18\times 365\times 24\times 60\times 60\approx 600,000,000 \text{ seconds}.$$
One student objected about leap days, since 365 should be 365.25 or so. But I pointed out that this difference was not as important as it might seem, since already we had made far larger rounding assumptions. For example, the student was not exactly 18 years old, but 18 years old and some several months; by using 18 years only, we made a bigger difference in the answer than caused by the leap-day issue, which would be a difference of only five days or so over 18 years. For the same reason, we should feel free to round the numbers to make the calculation easier. We are aiming at a ballpark estimate rather than an exact answer.

Let’s now do some more interesting cases.

anthora-cupCoffee in New York. How much coffee is made each day in New York? Would it fill the Statue of Liberty? First, let me say that I really don’t have any definite information about how much coffee is made each day in New York, and I fear that my own coffee-obsessed perspective will lead me to over-estimate the amount, but let’s give it a try. New York City has a population of approximately 10 million people. Some of those people, like myself, drink a large amount of coffee each day, but many of the others do not drink coffee at all. I would think that a sizable percentage of the NYC population does drink coffee, perhaps as much as a third or even half consumes coffee daily. Many of those coffee-drinkers have more than one cup per day, and also surely more coffee is made than consumed. So it seems reasonable to me to estimate that we have approximately one medium cup of coffee per person on average per day in New York. Basically, we’re saying that the heavy coffee drinkers and the made-but-not-sold coffee approximately makes up for those who abstain, making the average about one cup per person. So we are talking about 10 million cups of coffee per day. A medium cup of brewed coffee at Starbucks is I think about 12-16 ounces, a little less than a pint, and so let’s say about 3 cups per liter. This amounts to roughly 3 million liters of coffee.

Would it fill the Statue of Liberty? The statue itself is, I estimate, about twenty stories tall, counting the base, and if each story is 15 ft, or 5 meters, that would mean 100 meters tall, counting the base. But I think that the base is about half the height, so let’s say 50 meters for the actual statue itself. I’ve never been inside the statue, but my students say that it is about 10 meters across inside, a little more at the bottom than near the top. If we approximated it as a rectangular solid, that would give a volume of $10\times 10\times 50$ cubic meters, or 5000 cubic meters. But since the statue tapers as you go up, particularly in the arm holding the torch, it really is more like a cone than a rectangular solid, and so we should divide by three. But let’s divide just by two, because she isn’t quite as tapered as a cone. So the Statue of Liberty has a volume of approximately 2500 cubic meters. One cubic meter can be thought of as a 10 by 10 by 10 array of little 10cm cubes, and each of those is exactly one liter. So a cubic meter is 1000 liters, and therefore the Statue of Liberty has a volume of $2500\times 1000=2.5$ million liters. But since we had 3 million liters of coffee, the answer our calculation arrives at is: Yes, one day’s worth of New York coffee would fill up the Statue of Liberty!

Well, we do not have perfect confidence in our estimates and assumptions — for example, perhaps there are many fewer coffee drinkers in New York than we estimated or perhaps we underestimated the volume of the Statue of Liberty. Since the estimated volumes were of basically similar magnitudes, we aren’t really entitled to say that definitely the coffee would fill up the Statue of Liberty. Rather, what we have come to know is that those two volumes are comparably similar in size; they are in the same ballpark.

Elevator trips. While riding downtown last weekend with my son on the subway, a crowded 4 train, we overheard the group standing next to us talking about elevators. One lady said, “My elevator company serves as many elevator trips in New York City in five days as the population of the entire world,” and the rest of her group, impressed, nodded affirmatively in reply. But my thoughts, upon hearing that, were to make a quick calculation. Suppose all 10 million NYC residents rode an elevator 10 times every day, which is way too high (probably one trip per person per day is more reasonable, since many people live and work in buildings without elevators). Even in this extreme case of ten trips per person per day, it would mean only 100 million trips total per day, or 500 million trips over 5 days. This is much less than the world population, and so no way is that person’s claim true, especially since there are also many elevator companies. I thought of mentioning my calculation to those people on the subway, but decided against it. Walking out of the subway in the East Village, however, I asked my son (14 years old) whether he heard those people talking about elevators, and he replied, “Oh, yes, and when they said that, I calculated it in my head: no way is that true.” He then proceeded to explain his calculation, similar to mine. Yay, Horatio!

The Chicago marathon. In the run-up to the Chicago marathon this year, on a route that would wind through the windy city streets, Newsweek magazine reported, “Chicago Marathon organizers expect 1.7 million fans to line the route this year.” (Thanks to the critical math commentary of Mark Iris for bringing this example to my attention.) The organizers had emphasized the economic impact of these spectators, many of whom would presumably be eating in Chicago restaurants and staying at Chicago hotels. But is this a reasonable number?

Let’s calculate. A marathon is approximately 26 miles, and the route has two sides for spectators, so let’s round it to 50 miles of spectator viewing spots. Each mile is about 1800 yards, so we have $50\times 1800=90000$ yards of viewing spots. Each spectator, standing shoulder-to-shoulder, with all their stuff, takes up about a yard of space. So if the marathon was lined on both sides for the entire route with spectators standing shoulder-to-shoulder, this would amount to about 90,000 spectators. In order to have 1.7 million spectators, therefore, they would have to be lined up behind each other. But even if the spectators were 10 people deep on each side for the entire route, which is a vast crowd, it would still amount only to 900,000 people. We would have basically to double this to get to 1.7 million. So, if the live event really had 1.7 million spectators lining the route, then it would mean that the race was lined 20 people deep on each side for the entire route. No way is this number correct! I have never had the chance to attend the Chicago marathon, but at the New York marathon, which I assume is comparable, I know that while there are thick crowds at the finish line in Central Park and at some of the other prominent or especially interesting race locations, most of the rest of the route is much thinner, and at the typical nothing-special location, one could expect easily to have a front-row spot.

College of Staten IslandHow many bricks on the college campus? Our campus, covered with some lovely woods and green meadows, has a number of brick Georgian buildings. Most of these are the same standard size as the mathematics department, but there are also some larger buildings such as the library, the performing arts center, the campus center and the gymnasium. At my lecture, the students and I agreed that altogether it amounted to about 30 buildings of the size of the math building. This building is approximately 30 meters by 70 meters, which would make a perimeter of 200 meters, but because the building has wings and is not purely rectangular, let’s add another 100 meters for the folds in the walls, so about 300 meters around the base of the building. The building is two stories tall, about 10 meters tall, making the area of the walls about 3000 square meters. Of course, there are windows and doors that cut out of these walls, but let’s say that they are approximately accounted for by the extra bricks used in the trimming flourishes at the corners and top of the building. So we have 3000 square meters of brick. Each brick is approximately 10 cm by 30 cm, and so one square meter of brick would have about ten rows of a little more than three bricks across, or about 20 bricks. So each building has about $3000\times 20=60,000$ bricks. And with 30 buildings, this amounts to $30\times 60,000=1,800,000$ bricks in the buildings on campus. There are also some bricks in the central fountain, a few small brick walls and some bricks lining some of the walkways. So let’s add another ten percent for those extra bricks, arriving at about 2,000,000 bricks on campus in all.

Positive test result for a rare disease. Suppose that as part of your check-up, your doctor randomly administers a clinical test for a certain rare medical condition. The test is 99 percent accurate, in terms of false positives and false negatives, in the sense that 99 percent of people taking the test have an accurate result with the test. Suppose also that the disease itself is rare, held by only one in a million people. If your test comes back positive, what is the likelihood that you actually have the disease?

This is a subtle question. It might seem to make sense initially that there is a 99 percent chance that you have the disease, since that is the accuracy of the test. But this isn’t actually right, because it doesn’t account for the fact that the disease itself is extremely rare, and so the total number of false positives will actually far outweigh the true positive results. For example, suppose that one million people take the test. About one of these people actually has the disease, and that person is 99 percent likely to test positive. So we expect about one true positive result. And for the others, who don’t have the disease, 99 percent of them will test negative. In other words, about 990,000 people will test negative. The remaining 10,000 people, however, one percent of the total, will have false positive results! So you are in this group of 10,000 people who tested positive, with only one of them actually having the disease. So the odds that you actually have the disease are only about one in ten thousand.

Quarters to the moon. How many quarters would you have to stack to reach the moon? How much would it be worth? how much would it weigh? More than the earth? More than the Empire state building? If you put all those quarters into a pile, would it fit in Central Park?

Well, OK, the question is a little absurd, and there are all kinds of problematic issues: we couldn’t ever actually make such a tower of quarters, as it would topple over; it doesn’t make sense to build a tower to the moon, since the moon is in orbit around the earth and it is moving; the quarters would begin to orbit the earth themselves, or fall back down to the earth or to the moon. But let’s just try to ignore all those problematic issues, and just try to answer how many quarters it would take to make a pile that high.

How far is the moon? I don’t really know, and we could look it up to see what the astronomers say about it, but that would go against the back-of-the-envelope spirit of these problems. So I am just going to use two facts that I picked up years ago: first, I know that the speed of light is about 300,000 km per second; and second, I happen to know that it takes radio signals traveling at the speed of light about one second to get to the moon (eight minutes to the sun), and so the moon is about one light-second away. I remember this fact from having learned it in my childhood, because it was important in a science fiction story I had read, in which radio communication from earth to the moon base had a two-second delay: you send a message, it takes a second to get there, and then a second for the answer to come back. So the moon is one light second away, and light travels 300,000 km in one second. So the moon is about 300,000 kilometers distant (actual value: 387400 km).

Let’s now stack up the quarters. By eyeballing a quarter I had in my pocket, I’d say each quarter is about 2 mm thick, which means 500 quarters per meter, or 500,000 quarters per km. So altogether, we have $300,000\times500,000=150,000,000,000$ many quarters in the stack. A hundred and fifty billion quarters to the moon! It takes four quarters to make a dollar, so that is worth about $40 billion dollars, which might be less than you would have expected initially.

Let’s put those quarters in a big pile, packed as tightly as possible. Each quarter is a little over 2 cm in diameter, and so the volume of the rectangular solid bounding the quarter is a little more than 2 cm by 2 cm by 2 mm, or about 800 cubic mm, which we can round up to 1 cubic centimeter. That makes sense, because we can imagine folding a quarter up into a little 1 cm cube. So 1000 quarters takes up about one liter. Thus, our quarters-to-the-moon have a volume of about 150 million liters. There are 1000 liters in a cubic meter, and so this is 150,000 cubic meters. Since a cube 10 meters on a side, has a volume of 1000 cubic meters, we can fit all those quarters into 150 such cubes. I can easily imagine a giant art installation, with 150 such 10 meter cubes placed in Sheep Meadow in Central Park. They would definitely fit.

How much would the quarters weigh? Some of the students in my lecture had worked as cashiers, and so they were familiar with quarter rolls from the bank, which fit in your palm and have ten dollars worth of quarters, or forty quarters. I can imagine a quarter roll in one hand balancing the weight of a half-pound of gouda cheese at the deli in my other hand. So 40 quarters is .5 pound, which makes 100 quarters about 2.2 pounds, which is about 1 kg. We had 150 billion quarters, which is 1.5 billion times 100 quarters, so 1.5 billion kilograms.

Since we already imagined the quarters in Sheep Meadow in those 150 large boxes, clearly they weigh far less than the earth and perhaps comparable to the Empire State Building.

How much does the Empire State Building weigh? After posing this question, I found out that it is also evidently a famous Google interview question, and you can easily find other blog posts about it. But let me tell you how I thought about it. The Empire State Building is about 100 stories tall, and if we assume 12-15 feet per floor, or about 4 meters, this makes the total height (without tower) about 400 meters. The CUNY Graduate Center where I work is across the corner on Fifth Avenue and 34th street, and I have walked past the Empire State Building many times. I estimate that the base is about 75 meters on Fifth Avenue by 150 meters on 34th Street. But because of the step-backs, the middle part of the tower is considerably less than that, probably 30 by 80 near the top. Let’s say the average cross section of the building is 50 by 100 meters. Now, how much does that weigh? Of course, there is a lot of steel and masonary in the floors and walls, and concrete floors, and also all the contents of the building, with desks and paper files and books and interior walls and doors. Those things are heavy. I really have no idea how much those things weigh, but let me imagine that we build a virtual copy of a complete floor from the Empire State Building, without any of those floor beams or concrete or desks or walls or anything, and instead we flood that virtual room with water, until it has the same weight. Of course, the metal and concrete and masonary is much heavier than water, but the actual space is mostly empty air. How much water would it take to have the same weight? Well, let me just guess that we’d have to flood the virtual room about one-quarter deep with water to have the same weight as the actual materials. At 50 by 100 meters by 4 meters tall for each floor, this would mean a pool of water 50 by 100 meters, one meter deep, for a volume of 5000 cubic meters of water. We already said that one cubic meter of water is 1000 liters, each of which weighs one kilogram, so we are talking about $5000\times 1000$ kilograms or 5 million kilograms per floor. With 100 floors, this is 500 million kilograms, or 500,000 metric tons. So according to this estimate, the Empire State Building with all its contents weighs about 500,000 metric tons.

So it weighs less than the quarters to the moon (1.5 billion kilograms) —the Empire state building weighs about one-third as much as the quarters to the moon — but these two quantities are in the same ball park.

How many babies are born each day in New York City? The population of New York is approximately ten million people. And those people live about 70 years. Let’s imagine spreading those births out uniformly over the 70 year period. That means one million people born in 7 years. This is about 150,000 people born in one year, which is about 3000 births per week, or about 400 births per day. This estimate would be accurate if the city population were in a steady state, with births balancing deaths, but of course the population is increasing. On the other hand, most of that increase is from people moving here, not just being born here. But then again, the people moving to New York I expect are mainly young adults looking for a career, and so they will contribute to a heightened birth rate as they have children. Shall we add 25 percent more to account for the fact that the city is growing and not in a steady state? In that case, our estimate is that there are about 500 births each day in New York City.

There is an endless supply of such questions that can be calculated. I have been talking about them in the lectures for my course Math for Liberal Arts, an undergraduate course aimed at non-math students at my college. In this course, we are fairly free to cover some imaginative topics, and I’ve covered some game theory, some elementary graph theory, a little bit logic, and now this number sense topic, which I use to try to develop the students ability to attack a technical problem by breaking it down into more managable problems. But meanwhile, I also look upon it just as plain fun.

So here are a few more questions that you might enjoy thinking about. And please make your own.

  • What is the total volume of air that you have breathed in your lifetime? If you filled a balloon with all your hot air, how big would it be?
  • If you used that balloon as a hot-air balloon (with the hot air at your body temperature), would it have enough bouyency to lift you and all your possessions? See my post A lifetime of hot air for a detailed answer.
  • How many NYC metro cards exist? (Each lasts about a year or two; they get thrown out, but still exist in the landfill; the system has used metro cards for about twenty years; there is a large stock of new not-yet-used cards.)
  • How many doorknobs are there in your building?
  • How many subway tiles are there in the NYC subway system?
  • How many riders were on my crowded 4 train downtown this morning?
  • How many lightswitches are there on your university campus?
  • How much water does NYC use each day?
  • What are the odds that you drank a molecule of water that was once also drank by Julius Caesar?
  • How many people have there been? What fraction are currently alive?

Please try to figure them out, and post your solutions in the comments below!

Are all Gödel sentences equivalent?

By fdecomiteAnetode at en.wikipedia (Tunnels of Time Transferred from en.wikipedia) [CC BY 2.0 (http://creativecommons.org/licenses/by/2.0)], from Wikimedia CommonsI should like to consider a family of natural questions concerning the Gödel sentence — the sentence asserting its own non-provability, used by Gödel to prove the incompleteness theorem — and specifically the question whether we are really entitled to speak of the Gödel sentence. Is it unique? Is it unique up to equivalence? Up to provable equivalence? Could there be more than one Gödel sentence, perhaps for different manners of arithmetizing the syntax? Are they all provably equivalent? These questions have come up a number of times in various contexts, and since James Propp just sent me an email inquiry about it yesterday, let me address the questions here in reply (he is planning an upcoming post in a few weeks about incompleteness and self-reference — check back later for the link). I shall make use only of elementary classical incompleteness ideas, and I assume this has all been known for some time.

For definiteness, let us take first-order Peano Arithmetic ($\newcommand\PA{\text{PA}}\PA$) as the base theory over which we are considering provability, although similar observations can be made concerning other base theories. By formalizing the syntax in arithmetic using Gödel coding, we may easily produce an arithmetically expressible provability predicate $\newcommand\Prov{\text{Prov}}\Prov(x)$, which asserts that $x$ is the Gödel code of a $\PA$-provable sentence. By using this predicate and the fixed-point lemma, we may construct a sentence $\psi$ that asserts its own non-provability, meaning precisely: $\PA\vdash\psi\leftrightarrow\neg\Prov(\ulcorner\psi\urcorner)$. Such a sentence $\psi$ is known as the Gödel sentence, and we may use it to prove Gödel’s first incompleteness theorem as follows. Namely, it is easy to see that $\psi$ cannot be provable in $\PA$, for if it were, then in virtue of what $\psi$ asserts, we will have also proved that $\psi$ is not provable, contrary to fact. So $\psi$ is not provable, and therefore we see that indeed $\psi$ is actually true. So the sentence $\psi$ is true, yet unprovable. By paying a little closer attention to the argument, what we have actually argued is that $\PA$ itself proves that if $\newcommand\Con{\text{Con}}\Con(\PA)$, then $\psi$.

Of course, there are many fixed points, and for example any sentence that is provably equivalent to $\psi$, such as $\psi\wedge\psi$, is also provably equivalent to its own non-provability, and therefore serves as yet another Gödel sentence. In this trivial sense, there are infinitely many different Gödel sentences. But these particular sentences, by construction, are provably equivalent. Are all the Gödel sentences equivalent?

Indeed, a bit more generally, suppose that we have two implementations of the provability predicate, using perhaps different formalizations of the syntax, with $\varphi$ and $\psi$ being fixed points of non-provability-in-$\PA$, so that $\varphi$ and $\psi$ each assert their own non-provability with respect to those predicates. Must $\PA$ prove that $\varphi$ and $\psi$ are equivalent?

The first, main answer is that yes, indeed, if we have used a natural provability predicate, then all the Gödel sentences are provably equivalent, and this remains true even for different natural manners of formalizing the syntax. In this sense, therefore, there really is only one Gödel sentence and we are entitled to speak of the Gödel sentence. The reason is that every such Gödel sentence is provably equivalent to the assertion $\Con(\PA)$, and for natural formalizations of the syntax, meaning implementations where we can provably translate proofs from one system to another and the meta-theory, these assertions are all equivalent. To see this, suppose that $\psi$ is a Gödel sentence, which means that $\psi$ asserts its own non-provability. Since $\psi$ implies that there is a non-provable sentence, namely $\psi$ itself, it follows immediately that $\psi$ implies $\Con(\PA)$; conversely, if $\Con(\PA)$ holds, then the argument of the first incompleteness theorem that I mentioned above shows that $\psi$ is true. So $\PA\vdash\Con(\PA)\leftrightarrow\psi$. So all the fixed points are equivalent to the assertion $\Con(\PA)$. For systems where we can provably translate proofs from one system to another, the corresponding consistency assertions $\Con(\PA)$ are equivalent. And so it is fine to speak of the Gödel sentence.

But let me now discuss what happens if we should implement a more exotic form of the provability predicate. For example, consider the Rosser provability predicate, where we say that a sentence $\sigma$ is Rosser provable, if there is a proof of $\sigma$, but no shorter proof of $\neg\sigma$, meaning no proof of $\neg\sigma$ with a smaller Gödel code. The Rosser sentence $\rho$ is a non-provability fixed point of this notion, saying, “I am not Rosser provable.” Since Rosser provability is intuitively a stronger notion of provability, it would be reasonable to expect that the Rosser non-provability assertion is weaker than the Gödel sentence, and indeed that is the case. If the base theory is consistent, then the Rosser sentence cannot be provable, since if it were, then there would have to be a smaller proof of $\neg\rho$, violating consistency; but also, $\neg\rho$ cannot be provable, since this sentence implies immediately that $\rho$ is also provable with a shorter proof, in light of what $\rho$ asserts. So $\PA$ proves that $\Con(\PA)$ implies both $\Con(\PA+\rho)$ and $\Con(\PA+\neg\rho)$, and moreover, $\PA\vdash\Con(\PA)\to\rho$. But since $\PA\vdash\Con(\PA)\to\Con(\PA+\rho)$, it follows that $\PA$, if consistent, cannot prove the converse, $\rho\to\Con(\PA)$, since otherwise $\PA+\rho$ would prove its own consistency, in violation of the second incompleteness theorem. So the Rosser sentence $\rho$ is strictly weaker than $\Con(\PA)$ over $\PA$, and hence also strictly weaker than the usual Gödel sentence. But since the Rosser sentence $\rho$ is itself a fixed point of non-provability with respect to the Rosser formalization of provability, it shows that exotic formalizations of the provability predicate can indeed give rise to inequivalent fixed-point assertions.

Note that the Rosser provability predicate does not sustain provable translations to any of the usual (natural) formalizations of provability, because we cannot prove in $\PA$ as a general statement that whenever a statement is provable, then it is Rosser provable. We can, however, generally translate specific proofs to Rosser proofs, and to the meta-theory, assuming that the base theory is consistent.

Lastly, let’s consider a somewhat more exotic provability predicate, arising from the Feferman-style enumerations of $\PA$, which are arithmetically definable but not computably axiomatizable. Specifically, consider the axiomatization of $\PA$, where we continue to add the usual $\PA$ axioms, but only so long as the resulting theory remains consistent. This way of describing the theory gives rise to a corresponding provability predicate, which expresses provability in that enumerated theory. So sentence $\theta$ is Feferman provable, if it is provable using axioms that come from a consistent initial segment of the $\PA$ axiomatization. If $\phi$ is a Feferman-non-provability fixed point, then $\PA$ proves that $\phi$ is equivalent to the assertion that $\phi$ is not Feferman provable. Since $\PA$ easily proves that the Feferman theory is consistent, and also that any finite collection of $\PA$ axioms are part of the Feferman theory, it follows that the negation of any particular theorem of $\PA$ is a Feferman-non-provability fixed point, since $\PA$ proves that it is false and also that it is not provable in the Feferman theory. But those statements are not equivalent to the usual Gödel sentence, since they are inconsistent with $\PA$.

My very first lemma, which also happened to involve a philosophical dispute

Doppelverh-zentralproj.svgLet me recall the time of my very first lemma, which also happened to involve a philosophical dispute.

It was about 35 years ago; I was a kid in ninth grade at McKinley Junior High School, taking a class in geometry, taught by a charismatic math teacher. We were learning how to do proofs, which in that class always consisted of a numbered list of geometrical assertions, with a specific reason given for each assertion, either stating that it was “given” or that it followed from previous assertions by a theorem that we had come to know. Only certain types of reasons were allowed.

My instructor habitually used the overhead projector, writing on a kind of infinite scroll of transparency film, which he could wind up on one end of the projector, so as never to run out of room. During the semester, he had filled enough spools, it seemed, to fill the library of Alexandria.

One day, it came to be my turn to present to the rest of the class my proof of a certain geometrical theorem I had been assigned. I took the black marker and drew out my diagram and theorem statement. In my proof, I had found it convenient to first establish a certain critical fact, that two particular line segments in my diagram were congruent $\vec{PQ}\cong\vec{RS}$. In order to do so, I had added various construction lines to my diagram and reasoned with side-angle-side and so on.

Having established the congruency, I had then wanted to continue with my proof of the theorem. Since the previous construction lines were cluttering up my diagram, however, I simply erased them, leaving only my original diagram.

The class erupted with objection!  How could I sensibly continue now with my proof, claiming that $\vec{PQ}\cong\vec{RS}$, after I had erased the construction lines? After all, are those lines segments still congruent once we erase the construction lines that provided the reason we first knew this to be true? Many of the students believed that my having erased the construction lines invalidated my proof.

So there I was, in a ninth-grade math class, making a philosophical argument to my fellow students that the truth of the congruence $\vec{PQ}\cong\vec{RS}$ was independent of my having drawn the construction lines, and that we could rely on the truth of that fact later on in my proof, even if I were to erase those construction lines.

After coming to an uneasy, tentative resolution of this philosophical dispute, I was then allowed to continue with the rest of my proof, establishing the main theorem.

I realized only much later that this had been my very first lemma, since I had isolated a mathematically central fact about a certain situation, proving it with a separate argument, and then I had used that fact in the course of proving a more general theorem.

The hierarchy of logical expressivity

I’d like to give a simple account of what I call the hierarchy of logical expressivity for fragments of classical propositional logic. The idea is to investigate and classify the expressive power of fragments of the traditional language of propositional logic, with the five familiar logical connectives listed below, by considering subsets of these connectives and organizing the corresponding sublanguages of propositional logic into a hierarchy of logical expressivity.

  • conjunction (“and”), denoted $\wedge$
  • disjunction (“or”), denoted  $\vee$
  • negation (“not”), denoted $\neg$
  • conditional (“if…, then”), denoted  $\to$
  • biconditional (“if and only if”), denoted  $\renewcommand\iff{\leftrightarrow}\iff$

With these five connectives, there are, of course, precisely thirty-two ($32=2^5$) subsets, each giving rise to a corresponding sublanguage, the language of propositional assertions using only those connectives. Which sets of connectives are equally as expressive or more expressive than which others? Which sets of connectives are incomparable in their expressivity? How many classes of expressivity are there?

Before continuing, let me mention that Ms. Zoë Auerbach (CUNY Graduate Center), one of the students in my logic-for-philosophers course this past semester, Survey of Logic for Philosophers, at the CUNY Graduate Center in the philosophy PhD program, had chosen to write her term paper on this topic.  She has kindly agreed to make her paper, “The hierarchy of expressive power of the standard logical connectives,” available here, and I shall post it soon.

To focus the discussion, let us define what I call the (pre)order of logical expressivity on sets of connectives. Namely, for any two sets of connectives, I define that $A\leq B$ with respect to logical expressivity, just in case every logical expression in any number of propositional atoms using only connectives in $A$ is logically equivalent to an expression using only connectives in $B$. Thus, $A\leq B$ means that the connectives in $B$ are collectively at least as expressive as the connectives in $A$, in the sense that with the connectives in $B$ you can express any logical assertion that you were able to express with the connectives in $A$. The corresponding equivalence relation $A\equiv B$ holds when $A\leq B$ and $B\leq A$, and in this case we shall say that the sets are expressively equivalent, for this means that the two sets of connectives can express the same logical assertions.

Expressivity hierarchy

The full set of connectives $\{\wedge,\vee,\neg,\to,\iff\}$ is well-known to be complete for propositional logic in the sense that every conceivable truth function, with any number of propositional atoms, is logically equivalent to an expression using only the classical connectives. Indeed, already the sub-collection $\{\wedge,\vee,\neg\}$ is fully complete, and hence expressively equivalent to the full collection, because every truth function can be expressed in disjunctive normal form, as a disjunction of finitely many conjunction clauses, each consisting of a conjunction of propositional atoms or their negations (and hence altogether using only disjunction, conjunction and negation). One can see this easily, for example, by noting that for any particular row of a truth table, there is a conjunction expression that is true on that row and only on that row. For example, the expression $p\wedge\neg r\wedge s\wedge \neg t$ is true on the row where $p$ is true, $r$ is false, $s$ is true and $t$ is false, and one can make similar expressions for any row in any truth table. Simply by taking the disjunction of such expressions for suitable rows where a $T$ is desired, one can produce an expression in disjunctive normal form that is true in any desired pattern (use $p\wedge\neg p$ for the never-true truth function). Therefore, every truth function has a disjunctive normal form, and so $\{\wedge,\vee,\neg\}$ is complete.

Pressing this further, one can eliminate either $\wedge$ or $\vee$ by systematically applying the de Morgan laws
$$p\vee q\quad\equiv\quad\neg(\neg p\wedge\neg q)\qquad\qquad p\wedge q\quad\equiv\quad\neg(\neg p\vee\neg q),$$
which allow one to reduce disjunction to conjunction and negation or reduce conjunction to disjunction and negation. It follows that $\{\wedge,\neg\}$ and $\{\vee,\neg\}$ are each complete, as is any superset of these sets, since a set is always at least as expressive as any of it subsets. Similarly, because we can express disjunction with negation and the conditional via $$p\vee q\quad\equiv\quad \neg p\to q,$$ it follows that the set $\{\to,\neg\}$ can express $\vee$, and hence also is complete. From these simple observations, we may conclude that each of the following fourteen sets of connectives is complete. In particular, they are all expressively equivalent to each other.
$$\{\wedge,\vee,\neg,\to,\iff\}$$
$$\{\wedge,\vee,\neg,\iff\}\qquad\{\wedge,\to,\neg,\iff\}\qquad\{\vee,\to,\neg,\iff\}\qquad \{\wedge,\vee,\neg,\to\}$$
$$\{\wedge,\neg,\iff\}\qquad\{\vee,\neg,\iff\}\qquad\{\to,\neg,\iff\}$$
$$\{\wedge,\vee,\neg\}\qquad\{\wedge,\to,\neg\}\qquad\{\vee,\to,\neg\}$$
$$\{\wedge,\neg\}\qquad \{\vee,\neg\}\qquad\{\to,\neg\}$$

Notice that each of those complete sets includes the negation connective $\neg$. If we drop it, then the set $\{\wedge,\vee,\to,\iff\}$ is not complete, since each of these four connectives is truth-preserving, and so any logical expression made from them will have a $T$ in the top row of the truth table, where all atoms are true. In particular, these four connectives collectively cannot express negation $\neg$, and so they are not complete.

Clearly, we can express the biconditional as two conditionals, via $$p\iff q\quad\equiv\quad (p\to q)\wedge(q\to p),$$ and so the $\{\wedge,\vee,\to,\iff\}$ is expressively equivalent to $\{\wedge,\vee,\to\}$. And since disjunction can be expressed from the conditional with $$p\vee q\quad\equiv\quad ((p\to q)\to q),$$ it follows that the set is expressively equivalent to $\{\wedge,\to\}$. In light of $$p\wedge q\quad\equiv\quad p\iff(p\to q),$$ it follows that $\{\to,\iff\}$ can express conjunction and hence is also expressively equivalent to $\{\wedge,\vee,\to,\iff\}$. Since
$$p\vee q\quad\equiv\quad(p\wedge q)\iff(p\iff q),$$ it follows that $\{\wedge,\iff\}$ can express $\vee$ and hence also $\to$, because $$p\to q\quad\equiv\quad q\iff(q\vee p).$$ Similarly, using $$p\wedge q\quad\equiv\quad (p\vee q)\iff(p\iff q),$$ we can see that $\{\vee,\iff\}$ can express $\wedge$ and hence also is expressively equivalent to $\{\wedge,\vee,\iff\}$, which we have argued is equivalent to $\{\wedge,\vee,\to,\iff\}$.  For these reasons, the following sets of connectives are expressively equivalent to each other.
$$\{\wedge,\vee,\to,\iff\}$$
$$\{\wedge,\vee,\to\}\qquad\{\wedge,\vee,\iff\}\qquad \{\vee,\to,\iff\}\qquad \{\wedge,\to,\iff\}$$
$$\{\wedge,\iff\}\qquad \{\vee,\iff\}\qquad \{\to,\iff\}\qquad \{\wedge,\to\}$$
And as I had mentioned, these sublanguages are strictly less expressive than the full language, because these four connectives are all truth-preserving and therefore unable to express negation.

The set $\{\wedge,\vee\}$, I claim, is unable to express any of the other fundamental connectives, because $\wedge$ and $\vee$ are each false-preserving, and so any logical expression built from $\wedge$ and $\vee$ will have $F$ on the bottom row of the truth table, where all atoms are false. Meanwhile, $\to,\iff$ and $\neg$ are not false-preserving, since they each have $T$ on the bottom row of their defining tables. Thus, $\{\wedge,\vee\}$ lies strictly below the languages mentioned in the previous paragraph in terms of logical expressivity.

Meanwhile, using only $\wedge$ we cannot express $\vee$, since any expression in $p$ and $q$ using only $\wedge$ will have the property that any false atom will make the whole expression false (this uses the associativity of $\wedge$), and $p\vee q$ does not have this feature. Similarly, $\vee$ cannot express $\wedge$, since any expression using only $\vee$ is true if any one of its atoms is true, but $p\wedge q$ is not like this. For these reasons, $\{\wedge\}$ and $\{\vee\}$ are both strictly weaker than $\{\wedge,\vee\}$ in logical expressivity.

Next, I claim that $\{\vee,\to\}$ cannot express $\wedge$, and the reason is that the logical operations of $\vee$ and $\to$ each have the property that any expression built from that has at least as many $T$’s as $F$’s in the truth table. This property is true of any propositional atom, and if $\varphi$ has the property, so does $\varphi\vee\psi$ and $\psi\to\varphi$, since these expressions will be true at least as often as $\varphi$ is. Since $\{\vee,\to\}$ cannot express $\wedge$, this language is strictly weaker than $\{\wedge,\vee,\to,\iff\}$ in logical expressivity. Actually, since as we noted above $$p\vee q\quad\equiv\quad ((p\to q)\to q),$$ it follows that $\{\vee,\to\}$ is expressively equivalent to $\{\to\}$.

Meanwhile, since $\vee$ is false-preserving, it cannot express $\to$, and so $\{\vee\}$ is strictly less expressive than $\{\vee,\to\}$, which is expressively equivalent to $\{\to\}$.

Consider next the language corresponding to $\{\iff,\neg\}$. I claim that this set is not complete. This argument is perhaps a little more complicated than the other arguments we have given so far. What I claim is that both the biconditional and negation are parity-preserving, in the sense that any logical expression using only $\neg$ and $\iff$ will have an even number of $T$’s in its truth table. This is certainly true of any propositional atom, and if true for $\varphi$, then it is true for $\neg\varphi$, since there are an even number of rows altogether; finally, if both $\varphi$ and $\psi$ have even parity, then I claim that $\varphi\iff\psi$ will also have even parity. To see this, note first that this biconditional is true just in case $\varphi$ and $\psi$ agree, either having the pattern T/T or F/F. If there are an even number of times where both are true jointly T/T, then the remaining occurrences of T/F and F/T will also be even, by considering the T’s for $\varphi$ and $\psi$ separately, and consequently, the number of occurrences of F/F will be even, making $\varphi\iff\psi$ have even parity. If the pattern T/T is odd, then also T/F and F/T will be odd, and so F/F will have to be odd to make up an even number of rows altogether, and so again $\varphi\iff\psi$ will have even parity. Since conjunction, disjunction and the conditional do not have even parity, it follows that $\{\iff,\neg\}$ cannot express any of the other fundamental connectives.

Meanwhile, $\{\iff\}$ is strictly less expressive than $\{\iff,\neg\}$, since the biconditional $\iff$ is truth-preserving but negation is not. And clearly $\{\neg\}$ can express only unary truth functions, since any expression using only negation has only one propositional atom, as in $\neg\neg\neg p$. So both $\{\iff\}$ and $\{\neg\}$ are strictly less expressive than $\{\iff,\neg\}$.

Lastly, I claim that $\iff$ is not expressible from $\to$. If it were, then since $\vee$ is also expressible from $\to$, we would have that $\{\vee,\iff\}$ is expressible from $\to$, contradicting our earlier observation that $\{\to\}$ is strictly less expressive than $\{\vee,\iff\}$, as this latter set can express $\wedge$, but $\to$ cannot, since every expression in $\to$ has at least as many $T$’s as $F$’s in its truth table.

These observations altogether establish the hierarchy of logical expressivity shown in the diagram displayed above.

It is natural, of course, to want to extend the hierarchy of logical expressivity beyond the five classical connectives. If one considers all sixteen binary logical operations, then Greg Restall has kindly produced the following image, which shows how the hierarchy we discussed above fits into the resulting hierarchy of expressivity. This diagram shows only the equivalence classes, rather than all $65536=2^{16}$ sets of connectives.

Full binary expressivity lattice

If one wants to go beyond merely the binary connectives, then one lands at Post’s lattice, pictured below (image due to Emil Jeřábek), which is the countably infinite (complete) lattice of logical expressivity for all sets of truth functions, using any given set of Boolean connectives. Every such set is finitely generated.Post-lattice.svg

How does a slinky fall?

Have you ever observed carefully how a slinky falls? Suspend a slinky from one end, letting it hang freely in the air under its own weight, and then, let go! The slinky begins to fall. The top of the slinky, of course, begins to fall the moment you let go of it. But what happens at the bottom of the slinky? Does it also start to fall at the same moment you release the top? Or perhaps it moves upward, as the slinky contracts as it falls? Or does the bottom of the slinky simply hang motionless in the air for a time?

The surprising fact is that indeed the bottom of the slinky doesn’t move at all when you release the top of the slinky! It hangs momentarily motionless in the air in exactly the same coiled configuration that it had before the drop. This is the surprising slinky drop effect.

My son (age 13, eighth grade) took up the topic for his science project this year at school.  He wanted to establish the basic phenomenon of the slinky drop effect and to investigate some of the subtler aspects of it.  For a variety of different slinky types, he filmed the slinky drops against a graded background with high-speed camera, and then replayed them in slow motion to watch carefully and take down the data.  Here are a few sample videos. He made about a dozen drops altogether.  For the actual data collection, the close-up videos were more useful. Note the ring markers A, B, C, and so on, in some of the videos.

 

See more videos here.

For each slinky drop video, he went through the frames and recorded the vertical location of various marked rings (you can see the labels A, B, C and so on in some of the videos above) into a spreadsheet. From this data he then produced graphs such as the following for each slinky drop:

Small metal slinky graph

 

Large metal slinky graph

Plastic Slinky graph

In each case, you can see clearly in the graph the moment when the top of the slinky is released, since this is the point at which the top line begins to descend. The thing to notice next — the main slinky drop effect — is that the lower parts of the slinky do not move at the same time. Rather, the lower lines remain horizontal for some time after the drop point. Basically, they remain horizontal until the bulk of the slinky nearly descends upon them. So the experiments clearly establish the main slinky drop phenomenon: the bottom of the slinky remains motionless for a time hanging in the air unchanged after the top is released.

In addition to this effect, however, my son was focused on investigating a much more subtle aspect of the slinky drop phenomenon. Namely, when exactly does the bottom of the slinky start to move?  Some have said that the bottom moves only when the top catches up to it; but my son hypothesized, based on observations, as well as discussions with his father and uncles, that the bottom should start to move slightly before the bulk of the slinky meets it. Namely, he thought that when you release the top of the slinky, a wave of motion travels through the slinky, and this wave travels slightly fast than the top of the slinky falls. The bottom moves, he hypothesized, when the wave front first gets to the bottom.

His data contains some confirming evidence for this subtler hypothesis, but for some of the drops, the experiment was inconclusive on this smaller effect. Overall, he had a great time undertaking the science project.

June 2016 Update: On the basis of his science fair poster and presentation, my son was selected as nominee to the Broadcom Masters national science fair competition! He is now competing against other nominees (top 10% of participating science fairs) for a chance to present his research in Washington at the final national competition next October.

September 2016 Update: My son has now been selected as a Broadcom Masters semi-finalist, placing him in the top 300 amongst more than 6000 nominees. The finalists will be chosen in a few weeks, with the chance to present in Washington, D.C.

Slinky drop on YouTube | Modeling a falling slinky (Wired)
Explaining an astonishing slinky | Slinky drop on physics.stackexchange
Cross & Wheatland, “Modeling a falling slinky”

An equivalent formulation of the GCH

Aleph0 new.svgThe continuum hypothesis CH is the assertion that the size of the power set of a countably infinite set $\aleph_0$ is the next larger cardinal $\aleph_1$, or in other words, that $2^{\aleph_0}=\aleph_1$. The generalized continuum hypothesis GCH makes this same assertion about all infinite cardinals, namely, that the power set of any infinite cardinal $\kappa$ is the successor cardinal $\kappa^+$, or in other words, $2^\kappa=\kappa^+$.

Yesterday I received an email from Geoffrey Caveney, who proposed to me the following axiom, which I have given a name.   First, for any set $F$ of cardinals, define the $F$-restricted power set operation $P_F(Y)=\{X\subseteq Y\mid |X|\in F\}$ to consist of the subsets of $Y$ having a cardinality allowed by $F$.  The only cardinals of $F$ that matter are those that are at most the cardinality of $Y$.

The Alternative GCH is the assertion that for every cardinal number $\kappa$, there is a set $F$ of cardinals such that the $F$-restricted power set $P_F(\kappa)$ has size $\kappa^+$.

Caveney was excited about his axiom for three reasons. First, a big part of his motivation for considering the axiom was the observation that the equation $2^\kappa=\kappa^+$ is simply not correct for finite cardinals $\kappa$ (other than $0$ and $1$) — and this is why the GCH makes the assertion only for infinite cardinals $\kappa$ — whereas the alternative GCH axiom makes a uniform statement for all cardinals, including the finite cardinals, and it gets the right answer for the finite cardinals. Specifically, for any natural number $n$, we can let $F=\{0,1\}$, and then note that $n$ has exactly $n+1$ many subsets of size in $F$. Second, Caveney had also observed that the GCH implies his axiom, since as we just mentioned, it is true for the finite cardinals and for infinite $\kappa$ we can take $F=\{\kappa\}$, using the fact that every infinite cardinal $\kappa$ has $2^\kappa$ many subsets of size $\kappa$ (we are working in ZFC). Third, Caveney had noticed that his axiom implies the continuum hypothesis, since in the case that $\kappa=\aleph_0$, there would be a family $F$ for which $P_F(\aleph_0)$ has size $\aleph_1$. But since there are only countably many finite subsets of $\aleph_0$, it follows that $F$ must include $\aleph_0$ itself, and so this would mean that $\aleph_0$ has only $\aleph_1$ many infinite subsets, and this implies CH.

To my way of thinking, the natural question to consider was whether Caveney’s axiom was actually weaker than GCH or not. At first I noticed that the axiom implies $2^{\aleph_1}=\aleph_2$ and similarly $2^{\aleph_n}=\aleph_{n+1}$, getting us up to $\aleph_\omega$. Then, after a bit I noticed that we can push the argument through all the way.

Theorem. The alternative GCH is equivalent to the GCH.

Proof. We’ve already argued for the converse implication, so it remains only to show that the alternative GCH implies the GCH. Assume that the alternative GCH holds.

We prove the GCH by transfinite induction. For the anchor case, we’ve shown already above that the GCH holds at $\aleph_0$, that is, that CH holds. For the successor case, assume that the GCH holds at some $\delta$, so that $2^\delta=\delta^+$, and consider the case $\kappa=\delta^+$. By the alternative GCH, there is a family $F$ of cardinals such that $|P_F(\kappa)|=\kappa^+$. If every cardinal in $F$ is less than $\kappa$, then $P_F(\kappa)$ has size at most $\kappa^{<\kappa}=(\delta^+)^\delta=2^\delta=\delta^+=\kappa$, which is too small. So $\kappa$ itself must be in $F$, and from this it follows that $\kappa$ has at most $\kappa^+$ many subsets of size $\kappa$, which implies $2^\kappa=\kappa^+$. So the GCH holds at $\kappa$, and we’ve handled the successor case. For the limit case, suppose that $\kappa$ is a limit cardinal and the GCH holds below $\kappa$. So $\kappa$ is a strong limit cardinal. By the alternative GCH, there is a family $F$ of cardinals for which $P_F(\kappa)=\kappa^+$. It cannot be that all cardinals in $F$ are less than the cofinality of $\kappa$, since in this case all the subsets of $\kappa$ in $P_F(\kappa)$ would be bounded in $\kappa$, and so it would have size at most $\kappa$, since $\kappa$ is a strong limit. So there must be a cardinal $\mu$ in $F$ with $\newcommand\cof{\text{cof}}\cof(\kappa)\leq\mu\leq\kappa$. But in this case, it follows that $\kappa^\mu=\kappa^+$, and this implies $\kappa^{\cof(\kappa)}=\kappa^+$, since by König’s theorem it is always at least $\kappa^+$, and it cannot be bigger if $\kappa^\mu=\kappa^+$. Finally, since $\kappa$ is a strong limit cardinal, it follows easily that $2^\kappa=\kappa^{\cof(\kappa)}$, since every subset of $\kappa$ is determined by it’s initial segments, and hence by a $\cof(\kappa)$-sequence of bounded subsets of $\kappa$, of which there are only $\kappa$ many. So we have established that $2^\kappa=\kappa^+$ in the limit case, completing the induction. So we get all instances of the GCH.
QED

The finite axiom of symmetry

At the conclusion of my talk today for the CUNY Math Graduate Student Colloquium, Freiling’s axiom of symmetry Or, throwing darts at the real line, I had assigned an exercise for the audience, and so I’d like to discuss the solution here.

By PeterPan23 [Public domain], via Wikimedia Commons

The axiom of symmetry asserts that if you assign to each real number $x$ a countable set $A_x\subset\mathbb{R}$, then there should be two reals $x,y$ for which $x\notin A_y$ and $y\notin A_x$.

Informally, if you have attached to each element $x$ of a large set $\mathbb{R}$ a certain comparatively small subset $A_x$, then there should be two independent points $x,y$, meaning that neither is in the set attached to the other.

The challenge exercise I had made is to prove a finite version of this:

The finite axiom of symmetry. For each finite number $k$ there is a sufficiently large finite number $n$ such that for any set $X$ of size $n$ and any assignment $x\mapsto A_x$ of elements $x\in X$ to subsets $A_x\subset X$ of size $k$, there are elements $x,y\in X$ such that $x\notin A_y$ and $y\notin A_x$.

Proof. Suppose we are given a finite number $k$. Let $n$ be any number larger than $k^2+k$. Consider any set $X$ of size $n$ and any assignment $x\mapsto A_x$ of elements $x\in X$ to subsets $A_x\subset X$ of size at most $k$. Let $x_0,x_1,x_2,\dots,x_k$ be any $k+1$ many elements of $X$. The union $\bigcup_{i\leq k} A_{x_i}$ has size at most $(k+1)k=k^2+k$, and so by the choice of $n$ there is some element $y\in X$ not in any $A_{x_i}$. Since $A_y$ has size at most $k$, there must be some $x_i$ not in $A_y$. So $x_i\notin A_y$ and $y\notin A_{x_i}$, and we have fulfilled the desired conclusion. QED

Question. What is the optimal size of $n$ as a function of $k$?

It seems unlikely to me that my argument gives the optimal bound, since we can find at least one of the pair elements inside any $k+1$ size subset of $X$, which is a stronger property than requested. So it seems likely to me that the optimal bound will be smaller.

Every function can be computable!

I’d like to share a simple proof I’ve discovered recently of a surprising fact:  there is a universal algorithm, capable of computing any given function!

Wait, what? What on earth do I mean? Can’t we prove that some functions are not computable?  Yes, of course.

What I mean is that there is a universal algorithm, a Turing machine program capable of computing any desired function, if only one should run the program in the right universe. There is a Turing machine program $p$ with the property that for any function $f:\newcommand\N{\mathbb{N}}\N\to\N$ on the natural numbers, including non-computable functions, there is a model of arithmetic or set theory inside of which the function computed by $p$ agrees exactly with $f$ on all standard finite input. You have to run the program in a different universe in order that it will compute your desired function $f$.
$\newcommand\ZFC{\text{ZFC}}
\newcommand\PA{\text{PA}}
\newcommand\Con{\mathop{\text{Con}}}
\newcommand\proves{\vdash}
\newcommand{\concat}{\mathbin{{}^\smallfrown}}
\newcommand\restrict{\upharpoonright}
$
Theorem There is a Turing machine program $p$, carrying out the specific algorithm described in the proof, such that for any function $f:\N\to\N$, there is a model of arithmetic $M\models\PA$, or indeed a model of set theory $M\models\ZFC$ or more (if consistent), such that the function computed by program $p$ inside $M$ agrees exactly with $f$ on all standard finite input.

Tunnels of Time.jpg

The proof is elementary, relying essentially only on the ideas of the classical proof of the Gödel-Rosser theorem. To briefly review, for any computably axiomatized theory $T$ extending $\PA$, there is a corresponding sentence $\rho$, called the Rosser sentence, which asserts, “for any proof of $\rho$ in $T$, there is a smaller proof of $\neg\rho$.” That is, by smaller, I mean that the Gödel-code of the proof is smaller. One constructs the sentence $\rho$ by a simple application of the Gödel fixed-point lemma, just as one constructs the usual Gödel sentence that asserts its own non-provability. The basic classical facts concerning the Rosser sentence include the following:

  • If $T$ is consistent, then so are both $T+\rho$ and $T+\neg\rho$
  • $\PA+\Con(T)$ proves $\rho$.
  • The theories $T$, $T+\rho$ and $T+\neg\rho$ are equiconsistent.
  • If $T$ is consistent, then $T+\rho$ does not prove $\Con(T)$.

The first statement is the essential assertion of the Gödel-Rosser theorem, and it is easy to prove: if $T$ is consistent and $T\proves\rho$, then the proof would be finite in the meta-theory, and so since $T$ would have to prove that there is a smaller proof of $\neg\rho$, that proof would also be finite in the meta-theory and hence an actual proof, contradicting the consistency of $T$. Similarly, if $T\proves\neg\rho$, then the proof would be finite in the meta-theory, and so $T$ would be able to verify that $\rho$ is true, and so $T\proves\rho$, again contradicting consistency. By internalizing the previous arguments to PA, we see that $\PA+\Con(T)$ will prove that neither $\rho$ nor $\neg\rho$ are provable in $T$, making $\rho$ vacuously true in this case and also establishing $\Con(T+\rho)$ and $\Con(T+\neg\rho)$, for the second and third statements. In particular, $T+\Con(T)\proves\Con(T+\rho)$, which implies that $T+\rho$ does not prove $\Con(T)$ by the incompleteness theorem applied to the theory $T+\rho$, for the fourth statement.

Let’s now proceed to the proof of the theorem. To begin, we construct what I call the Rosser tree over a c.e. theory $T$. Namely, we recursively define theories $R_s$ for each finite binary string $s\in 2^{{<}\omega}$, placing the initial theory $R_{\emptyset}=T$ at the root, and then recursively adding either the Rosser sentence $\rho_s$ for the theory $R_s$ or its negation $\neg\rho_s$ at each stage to form the theories at the next level of the tree.
$$R_{s\concat 1}=R_s+\rho_s$$
$$R_{s\concat 0}=R_s+\neg\rho_s$$
Each theory $R_s$ is therefore a finite extension of $T$ by successively adding the appropriate Rosser sentences or their negations in the pattern described by $s$. If the initial theory $T$ is consistent, then it follows by induction using the Gödel-Rosser theorem that all the theories $R_s$ in the Rosser tree are consistent. Extending our notation to the branches through the tree, if $f\in{}^\omega 2$ is an infinite binary sequence, we let $R_f=\bigcup_n R_{f\upharpoonright n}$ be the union of the theories arising along that branch of the Rosser tree. In this way, we have constructed a perfect set of continuum many distinct consistent theories.

I shall now describe a universal algorithm for the case of computing binary functions. Consider the Rosser tree over the theory $T=\PA+\neg\Con(\PA)$. This is a consistent theory that happens to prove its own inconsistency. By considering the Gödel-codes in order, the algorithm should begin by searching for a proof of the Rosser sentence $\rho_{\emptyset}$ or its negation in the initial theory $R_{\emptyset}$. If such a proof is ever found, then the algorithm outputs $0$ or $1$ on input $0$, respectively, depending on whether it was the Rosser sentence or its negation that was found first, and moves to the next theory in the Rosser tree by adding the opposite statement to the current theory. Then, it starts searching for a proof of the Rosser sentence of that theory or its negation. At each stage in the algorithm, there is a current theory $R_s$, depending on which prior proofs have been found, and the algorithm searches for a proof of $\rho_s$ or $\neg\rho_s$. If found, it outputs $0$ or $1$ accordingly (on input $n=|s|$), and moves to the next theory in the Rosser tree by adding the opposite statement to the current theory.

If $f:\N\to 2=\{0,1\}$ is any binary function on the natural numbers, then let $R_f$ be the theory arising from the corresponding path through the Rosser tree, and let $M\models R_f$ be a model of this theory. I claim that the universal algorithm I just described will compute exactly $f(n)$ on input $n$ inside this model. The thing to notice is that because $\neg\Con(\PA)$ was part of the initial theory, the model $M$ will think that all the theories in the Rosser tree are inconsistent. So the model will have plenty of proofs of every statement and its negation for any theory in the Rosser tree, and so in particular, the function computed by $p$ in $M$ will be a total function. The question is which proofs will come first at each stage, affecting the values of the function. Let $s=f\restrict n$ and notice that $R_s$ is true in $M$. Suppose inductively that the function computed by $p$ has worked correctly below $n$ in $M$, and consider stage $n$ of the procedure. By induction, the current theory will be exactly $R_s$, and the algorithm will be searching for a proof of $\rho_s$ or its negation in $R_s$. Notice that $f(n)=1$ just in case $\rho_s$ is true in $M$, and because of what $\rho_s$ asserts and the fact that $M$ thinks it is provable in $R_s$, it must be that there is a smaller proof of $\neg\rho_s$. So in this case, the algorithm will find the proof of $\neg\rho_s$ first, and therefore, according to the precise instructions of the algorithm, it will output $1$ on input $n$ and add $\rho_s$ (the opposite statement) to the current theory, moving to the theory $R_{s\concat 1}$ in the Rosser tree. Similarly, if $f(n)=0$, then $\neg\rho_s$ will be true in $M$, and the algorithm will therefore first find a proof of $\rho_s$, give output $0$ and add $\neg\rho_s$ to the current theory, moving to $R_{s\concat 0}$. In this way, the algorithm finds the proofs in exactly the right way so as to have $R_{f\restrict n}$ as the current theory at stage $n$ and thereby compute exactly the function $f$, as desired.

Basically, the theory $R_f$ asserts exactly that the proofs will be found in the right order in such a way that program $p$ will exactly compute $f$ on all standard finite input. So every binary function $f$ is computed by the algorithm in any model of the theory $R_f$.

Let me now explain how to extend the result to handle all functions $g:\N\to\N$, rather than only the binary functions as above. The idea is simply to modify the binary universal algorithm in a simple way. Any function $g:\N\to \N$ can be coded with a binary function $f:\N\to 2$ in a canonical way, for example, by having successive blocks of $1$s in $f$, separated by $0$s, with the $n^{\rm th}$ block of size $g(n)$. Let $q$ be the algorithm that runs the binary universal algorithm described above, thereby computing a binary sequence, and then extract from that binary sequence a corresponding function from $\N$ to $\N$ (this may fail, if for example, the binary sequence is finite or if it has only finitely many $0$s). Nevertheless, for any function $g:\N\to \N$ there is a binary function $f:\N\to 2$ coding it in the way we have described, and in any model $M\models R_f$, the binary universal algorithm will compute $f$, causing this adapted algorithm to compute exactly $g$ on all standard finite input, as desired.

Finally, let me describe how to extend the result to work with models of set theory, rather than models of arithmetic. Suppose that $\ZFC^+$ is a consistent c.e. extension of ZFC; perhaps it is ZFC itself, or ZFC plus some large cardinal axioms. Let $T=\ZFC^++\neg\Con(\ZFC^+)$ be a slightly stronger theory, which is also consistent, by the incompleteness theorem. Since $T$ interprets arithmetic, the theory of Rosser sentences applies, and so we may build the corresponding Rosser tree over $T$, and also we may undertake the binary universal algorithm using $T$ as the initial theory. If $f:\N\to 2$ is any binary function, then let $R_f$ be the theory arising on the corresponding branch through the Rosser tree, and suppose $M\models R_f$. This is a model of $\ZFC^+$, which also thinks that $\ZFC^+$ is inconsistent. So again, the universal algorithm will find plenty of proofs in this model, and as before, it will find the proofs in just the right order that the binary universal algorithm will compute exactly the function $f$. From this binary universal algorithm, one may again design an algorithm universal for all functions $g:\N\to\N$, as desired.

One can also get another kind of universality. Namely, there is a program $r$, such that for any finite $s\subset\N$, there is a model $M$ of $\PA$ (or $\ZFC$, etc.) such that inside the model $M$, the program $r$ will enumerate the set $s$ and nothing more. One can obtain such a program $r$ from the program $p$ of the theorem: just let $r$ run the universal binary program $p$ until a double $0$ is produced, and then interprets the finite binary string up to that point as the set $s$ to output.

Let me now also discuss another form of universality.

Corollary
There is a program $p$, such that for any model $M\models\PA+\Con(\PA)$ and any function $f:M\to M$ that is definable in $M$, there is an end-extension of $M$ to a taller model $N\models\PA$ such that in $N$, the function computed by program $p$ agrees exactly with $f$ on input in $M$.

Proof
We simply apply the main theorem inside $M$. The point is that if $M$ thinks $\Con(\PA)$, then it can build what it thinks is the tree of Rosser extensions, and it will think that each step maintains consistency. So the theory $T_f$ that it constructs will be consistent in $M$ and therefore have a model (the Henkin model) definable in $M$, which will therefore be an end-extension of $M$.
QED

This last application has a clear affinity with a theorem of Woodin’s, recently extended by Rasmus Blanck and Ali Enayat. See Victoria Gitman’s posts about her seminar talk on those theorems: Computable processes can produce arbitrary outputs in nonstandard models, continuation.

Alternative proof.  Here is an alternative elegant proof of the theorem based on the comments below of Vadim Kosoy. Let $T$ be any consistent computably axiomatizable theory interpreting PA, such as PA itself or ZFC or what have you. For any Turing machine program $e$, let $q(e)$ be a program carrying out the following procedure: on input $n$, search systematically for a finite function $h:X\to\mathbb{N}$, with $X$ finite and $n\in X$, and for a proof of the statement “program $p$ does not agree with $h$ on all inputs in $X$,” using the function $h$ simply as a list of values for this assertion. For the first such function and proof that is found, if any, give as output the value $h(n)$.

Since the function $e\mapsto q(e)$ is computable, there is by Kleene’s recursion theorem a program $p$ for which $p$ and $f(p)$ compute the same function, and furthermore, $T$ proves this.  So the program $p$ is searching for proofs that $p$ itself does not behave in a certain way, and then it is behaving in that way when such a proof is found.

I claim that the theory $T$ does not actually prove any of those statements, “program $p$ does not agree with $h$ on inputs in $X$,” for any particular finite function $h:X\to\mathbb{N}$. If it did prove such a statement, then for the smallest such function and proof, the output of $p$ would indeed be $h$ on all inputs in $X$, by design. Thus, there would also be a proof that the program did agree with this particular $h$, and so $T$ would prove a contradiction, contrary to our assumption that it was consistent. So $T$ actually proves none of those statements. In particular, the program $p$ computes the empty function in the standard model of arithmetic. But also, for any particular finite function $h:X\to\mathbb{N}$, we may consistently add the assertion “program $p$ agrees with $h$ on inputs in $X$” to $T$, since $T$ did not refute this assertion.

For any function $f:\mathbb{N}\to\mathbb{N}$, let $T_f$ be the theory $T$ together with all assertions of the form “program $p$ halts on input $n$ with value $k$”, for the particular value $k=f(n)$.  I claim that this theory is consistent, for if it is not, then by compactness there would be finitely many of the assertions that enable the inconsistency, and so there would be a finite function $h:X\to\mathbb{N}$, with $h=f\upharpoonright X$, such that $T$ proved the program $p$ does not agree with $h$ on inputs in $X$. But in the previous paragraph, we proved that this doesn’t happen. And so the theory $T_f$ is consistent.

Finally, note that in any model of $T_f$, the program $p$ computes the function $f$ on standard input, because these assertions are exactly made in the theory. QED

Famous quotations in their original first-order language

Historians everywhere are shocked by the recent discovery that many of our greatest thinkers and poets had first expressed their thoughts and ideas in the language of first-order predicate logic, and sometimes modal logic, rather than in natural language. Some early indications of this were revealed in the pioneering historical research of Henle, Garfield and Tymoczko, in their work Sweet Reason:

We now know that the phenomenon is widespread!  As shown below, virtually all of our cultural leaders have first expressed themselves in the language of first-order predicate logic, before having been compromised by translations into the vernacular.

Rolling Stones 04.jpg

$\neg\lozenge\neg\exists s\ G(i,s)$

Hamletstrat.JPG

$(\exists x\ x=i)\vee\neg(\exists x\ x=i)$

Lorde Laneway 7 (cropped).jpg

$\left(\strut\neg\exists t\ \exists d\ \strut D(d)\wedge F(d)\wedge S_t(i,d)\right)\wedge\left(\strut\neg\exists t\ w\in_t \text{Ro}\right)\wedge\left(\strut \text{Ru}(i,y)\to \lozenge\text{C}(y,i,qb)\wedge \text{Ru}(i)\wedge\text{Ru}(i)\wedge\text{Ru}(i)\wedge\text{Ru}(i)\right)$

$\exists!t\ T(t) \wedge \forall t\ (T(t)\to G(t))$

Dawkins aaconf.jpg

$\neg B_i \exists g\ G(g)$

Do Mayor armadura.svg

$\forall b\ \left(\strut G(b)\wedge B(b)\to \exists x\ (D(b,x)\wedge F(x))\right)$

Moby Dick final chase

$(\exists!w\ W_1(w)\wedge W_2(w)), \ \ \exists w\ W_1(w)\wedge W_2(w)\wedge S(y,w)$?

The Beatles in America

$\exists s\ Y(s)\wedge S(s)\wedge \forall x\ L(x,s)$

Statue of Peter Pan and Tinkerbell in Dunedin Botanic Gardens, Dunedin, New Zealand.jpg

$\exists p\ \left[\forall c\ (c\neq p\to G(c))\right]\wedge\neg G(p)$

Robert-Plant.jpg

$\exists l\ \left[L(l)\wedge \Box_l\left({}^\ulcorner\,\forall g\ \text{Gl}(g)\to \text{Gd}(g){}^\urcorner\right)\wedge\exists s\ \left(SH(s)\wedge B(l,s)\right)\right]$

$(\forall p\in P\ \exists c\in\text{Ch}\ c\in p)\wedge(\forall g\in G\ \exists c\in\text{Cr}\ c\in g)$

FDRfiresidechat2.jpg

$\forall x (F(w,x)\leftrightarrow x=F)$

Lewis Carroll Self Portrait 1856 circa.jpg

$B\wedge \forall x\ \left[S(x)\wedge T(x)\to \exists!w\ W(w)\wedge\text{Gy}(x,w)\wedge\text{Gi}(x,w)\right]$

Oscar Wilde portrait by Napoleon Sarony - albumen.jpg

$\exists!x\ D(x)\wedge D(\ {}^\ulcorner D(i){}^\urcorner\ )$

Tolstoy by Repin 1901 cropped.jpg

$\forall f\ \forall g\ \left(\strut H(f)\wedge H(g)\to f\sim g\right)\wedge\forall f\ \forall g\ \left(\strut\neg H(f)\wedge \neg H(g)\to \neg\ f\sim g\right)$

There Was An Old Woman Who Lived In A Shoe - WW Denslow - Project Gutenberg etext 18546.jpg

$\exists w\ \left(\strut O(w)\wedge W(w)\wedge\exists s\ (S(s)\wedge L(w,s))\right)$

Frans Hals - Portret van René Descartes.jpg

$C(i)\to \exists x\ x=i$

Elvis Presley first national television appearance 1956.jpg

$\neg\neg\left(\strut H(y)\wedge D(y)\right)$

Judy Garland in The Wizard of Oz trailer 2.jpg

$\neg (d\in K)\wedge\neg (t\in K)$

Meat Loaf in performance (New York, 2004).jpg

$W(i,y)\wedge N(i,y)\wedge\neg\neg\lozenge L(i,y)\wedge \left(\strut \neg\ \frac23<0\to\neg S(y)\right)$

Marlon brando waterfront 6.jpg

$\lozenge \text{CL}(i)\wedge\lozenge C(i)\wedge \lozenge (\exists x\ x=i)\wedge B(i)$

$\forall x\ K_x({}^\ulcorner \forall m\ \left[M(m)\wedge S(m)\wedge F(m)\to\Box\ \exists w\ M(m,w)\right]{}^\urcorner)$

Theodor Seuss Geisel (01037v).jpg

$\forall e\forall h\ \left(\strut G(e)\wedge E(e)\wedge H(h)\to \neg L(i,e,h)\right)$

Bob Dylan June 23 1978.jpg

$\forall p\ \Box\text{St}(p)$

The Beach Boys Lost Concert.jpg

$\lozenge^w_i\ \forall g\in G\ \lozenge (g\in C)$

T S Eliot Simon Fieldhouse.jpg

$\forall m\ (a\leq_C m)$

Charles Dickens - Project Gutenberg eText 13103.jpg

$\forall t\ (p\geq t)\wedge \forall t\ (p\leq t)$

Emily Dickinson daguerreotype (cropped).jpg

$\forall x\ (F(x)\iff x=h)$

George Orwell.jpg

$(\forall x\ \forall y\ x=y)\wedge(\exists x\ \exists y ([\![x=x]\!]>[\![y=y]\!]))$

Ebcosette.jpg

$\forall p\ \left(\strut\neg W(p)\to \neg S(p)\right)$

$\exists x\Box\Box x\wedge \exists x\Box\neg\Box x\wedge\exists x\neg\Box\neg\Box x$

Gustave Doré - Dante Alighieri - Inferno - Plate 8 (Canto III - Abandon all hope ye who enter here)

$\forall p \left(\strut E(p)\to \forall h\in H\ A(p,h)\right)$

Dear readers, in order to assist with this important historical work, please provide translations into ordinary English in the comment section below of any or all of the assertions listed above. We are interested to make sure that all our assertions and translations are accurate.

In addition, any readers who have any knowledge of additional instances of famous quotations that were actually first made in the language of first-order predicate logic (or similar) are encouraged to post comments below detailing their knowledge. I will endeavor to add such additional examples to the list.

Thanks to Philip Welch, to my brother Jonathan, and to Ali Sadegh Daghighi (in the comments) for providing some of the examples, and to Timothy Gowers for some improvements.

Please post comments or send me email if hints are desired.

A Mathematician's Year in Japan, by Joel David Hamkins, available on Amazon Kindle Books

Diamond on the ordinals

I was recently surprised to discover that if there is a definable well-ordering of the universe, then the diamond principle on the ordinals holds for definable classes, automatically. In fact, the diamond principle for definable classes is simply equivalent in ZFC to the existence of a definable well-ordering of the universe. It follows as a consequence that the diamond principle for definable classes, although seeming to be fundamentally scheme-theoretic, is actually expressible in the first-order language of set theory.

In set theory, the diamond principle asserts the existence of a sequence of objects $A_\alpha$, of growing size, such that any large object at the end is very often anticipated by these approximations.  In the case of diamond on the ordinals, what we will have is a definable sequence of $A_\alpha\subseteq\alpha$, such that for any definable class of ordinals $A$ and any definable class club set $C$, there are ordinals $\theta\in C$ with $A\cap\theta=A_\theta$.  This kind of principle typically allows one to undertake long constructions that will diagonalize against all the large objects, by considering and reacting to their approximations $A_\alpha$. Since every large object $A$ is often correctly approximated that way, this enables many such constructions to succeed.

Let me dive right in to the main part of the argument.$\newcommand\restrict\upharpoonright
\newcommand\of\subseteq
\newcommand\Ord{\text{Ord}}
\newcommand\HOD{\text{HOD}}\newcommand\ZFC{\text{ZFC}}$

Theorem. In $\ZFC$, if there is a definable well-ordering of the universe, then $\Diamond_{\Ord}$ holds for definable classes. That is, there is a $p$-definable sequence $\langle A_\alpha\mid\alpha<\Ord\rangle$, such that for any definable class $A\of\Ord$ and any definable closed unbounded class of ordinals $C\of\Ord$ (allowing parameters), there is some $\theta\in C$ with $A\cap\theta=A_\theta$.

Proof. The theorem is proved as a theorem scheme; namely, I shall provide a specific definition for the sequence $\vec A=\langle A_\alpha\mid\alpha<\Ord\rangle$, using the same parameter $p$ as the definition of the global well-order and with a definition of closely related syntactic complexity, and then prove as a scheme, a separate statement for each definable class $A\of\Ord$ and class club $C\of\Ord$, that there is some $\alpha\in C$ with $A\cap\alpha=A_\alpha$. The definitions of the classes $A$ and $C$ may involve parameters and have arbitrary complexity.

Let $\lhd$ be the definable well-ordering of the universe, definable by a specific formula using some parameter $p$. I define the $\Diamond_{\Ord}$-sequence $\vec A=\langle A_\alpha\mid\alpha<\Ord\rangle$ by transfinite recursion. Suppose that $\vec A\restrict\theta$ has been defined. I shall let $A_\theta=\emptyset$ unless $\theta$ is a $\beth$-fixed point above the rank of $p$ and there is a set $A\of\theta$ and a closed unbounded set $C\of\theta$, with both $A$ and $C$ definable in the structure $\langle V_\theta,\in\rangle$ (allowing parameters), such that $A\cap\alpha\neq A_\alpha$ for every $\alpha\in C$. In this case, I choose the least such pair $(A,C)$, minimizing first on the maximum of the logical complexities of the definitions of $A$ and of $C$, and then minimizing on the total length of the defining formulas of $A$ and $C$, and then minimizing on the Gödel codes of those formulas, and finally on the parameters used in the definitions, using the well-order $\lhd\restrict V_\theta$. For this minimal pair, let $A_\theta=A$. This completes the definition of the sequence $\vec A=\langle A_\alpha\mid\alpha\in\Ord\rangle$.

Let me remark on a subtle point, since the meta-mathematical issues loom large here. The definition of $\vec A$ is internal to the model, and at stage $\theta$ we ask about subsets of $\theta$ definable in $\langle V_\theta,\in\rangle$, using the truth predicate for this structure. If we were to run this definition inside an $\omega$-nonstandard model, it could happen that the minimal formula we get is nonstandard, and in this case, the set $A$ would not actually be definable by a standard formula. Also, even when $A$ is definable by a standard formula, it might be paired (with some constants), with a club set $C$ that is defined only by a nonstandard formula (and this is why we minimize on the maximum of the complexities of the definitions of $A$ and $C$ together). So one must give care in the main argument keeping straight the distinction between the meta-theoretic natural numbers and the internal natural numbers of the object theory $\ZFC$.

Let me now prove that the sequence $\vec A$ is indeed a $\Diamond_{\Ord}$-sequence for definable classes. The argument follows in spirit the classical proof of $\Diamond$ in $L$, subject to the mathematical issues I mentioned. If the sequence $\vec A$ is not a diamond sequence, then there is some definable class $A\of\Ord$, defined in $\langle V,\in\rangle$ by a specific formula $\varphi$ and parameter $z$, and definable club $C\of\Ord$, defined by some $\psi$ and parameter $y$, with $A\cap\alpha\neq A_\alpha$ for every $\alpha\in C$. We may assume without loss that these formulas are chosen so as to be minimal in the sense of the construction, so that the maximum of the complexities of $\varphi$ and $\psi$ are as small as possible, and the lengths of the formulas, and the Gödel codes and finally the parameters $z,y$ are $\lhd$-minimal, respectively, successively. Let $m$ be a sufficiently large natural number, larger than the complexity of the definitions of $\lhd$, $A$, $C$, and large enough so that the minimality condition we just discussed is expressible by a $\Sigma_m$ formula. Let $\theta$ be any $\Sigma_m$-correct ordinal above the ranks of the parameters used in the definitions. It follows that the restrictions $\lhd\restrict V_\theta$ and also $A\cap\theta$ and $C\cap\theta$ and $\vec A\restrict\theta$ are definable in $\langle V_\theta,\in\rangle$ by the same definitions and parameters as their counterparts in $V$, that $C\cap\theta$ is club in $\theta$, and that $A\cap\theta$ and $C\cap\theta$ form a minimal pair using those definitions with $A\cap\alpha\neq A_\alpha$ for any $\alpha\in C\cap\theta$. Thus, by the definition of $\vec A$, it follows that $A_\theta=A\cap\theta$. Since $C\cap\theta$ is unbounded in $\theta$ and $C$ is closed, it follows that $\theta\in C$, and so $A_\theta=A\cap\theta$ contradicts our assumption about $A$ and $C$. So there are no such counterexample classes, and thus $\vec A$ is a $\Diamond_{\Ord}$-sequence with respect to definable classes, as claimed.
QED

Theorem. The following are equivalent over $\ZFC$.

  1. There is a definable well-ordering of the universe, using some set parameter $p$.
  2. $V=\HOD_{\{p\}}$, for some set $p$.
  3. $\Diamond_{\Ord}$ holds for definable classes. That is, there is a set parameter $p$ and a definable sequence $\vec A=\langle A_\alpha\mid\alpha<\Ord\rangle$, such that for any definable class $A\of\Ord$ and definable class club $C\of\Ord$, there is some $\alpha\in C$ with $A\cap\alpha=A_\alpha$.

Proof. Let me first give the argument, and then afterward discuss some issues about the formalization, which involves some subtle issues.

($1\to 2$) $\newcommand\rank{\text{rank}}$Suppose that $\lhd$ is a $p$-definable well-ordering of $V$, which means that every set has a $\lhd$-minimal element. Let us refine this order by defining $x\lhd’ y$, just in case $\rank(x)<\rank(y)$ or $\rank(x)=\rank(y)$ and $x\lhd y$. The new order is also a well-order, which now respects rank. In particular, the order $\lhd’$ is set-like, and so every object $x$ is the $\alpha^{th}$ element with respect to the $\lhd’$-order, for some ordinal $\alpha$. Thus, every object is definable from $p$ and an ordinal, and so $V=\HOD_{\{p\}}$, as desired.

($2\to 1$) If $V=\HOD_{\{p\}}$, then we have the canonical well-order of $\HOD$ using parameter $p$, similar to how one shows that the axiom of choice holds in $\HOD$. Namely, define $x\lhd y$ if and only if $\rank(x)<\rank(y)$, or the ranks are the same, but $x$ is definable from $p$ and ordinal parameters in some $V_\theta$ with a smaller $\theta$ than $y$ is, or the ranks are the same and the $\theta$ is the same, but $x$ is definable in that $V_\theta$ by a formula with a smaller Gödel code, or with the same formula but smaller ordinal parameters. It is easy to see that this is a $p$-definable well-ordering of the universe.

($1\to 3$) This is the content of the theorem above.

($3\to 1$) If $\vec A$ is a $p$-definable $\Diamond_{\Ord}$-sequence for definable classes, then it is easy to see that if $A$ is a set of ordinals, then $A$ must arise as $A_\alpha$ for unboundedly many $\alpha$. In $\ZFC$, using the axiom of choice, it is a standard fact that every set is coded by a set of ordinals. So let us define that $x\lhd y$, just in case $x$ is coded by a set of ordinals that appears earlier on $\vec A$ than any set of ordinals coding $y$. This is clearly a well-ordering, since the map sending $x$ to the ordinal $\alpha$ for which $A_\alpha$ codes $x$ is an $\Ord$-ranking of $\lhd$. So there is a $p$-definable well-ordering of the universe.
QED

An observant reader will notice some meta-mathematical issues concerning the previous theorem. The issue is that statements 1 and 2 are known to be expressible by statements in the first-order language of set theory, as single statements, but for statement 3 we have previously expressed it only as a scheme of first-order statements. So how can they be equivalent? The answer is that the full scheme-theoretic content of statement 3 follows already from instances in which the complexity of the definitions of $A$ and $C$ are bounded. Basically, once one gets the global well-order, then one can construct a $\Diamond_{\Ord}$-sequence that works for all definable classes. In this sense, we may regard the diamond principle $\Diamond_{\Ord}$ for definable classes as not really a scheme of statements, but rather equivalent to a single first-order assertion.

Lastly, let me consider the content of the theorems in Gödel-Bernays set theory or Kelley-Morse set theory. Of course, we know that there can be models of these theories that do not have $\Diamond_{\Ord}$ in the full second-order sense. For example, it is relatively consistent with ZFC that an inaccessible cardinal $\kappa$ does not have $\Diamond_\kappa$, and in this case, the structure $\langle V_\kappa,\in,V_{\kappa+1}\rangle$ will satisfy GBC and even KM, but it won’t have $\Diamond_{\Ord}$ with respect to all classes, even though it has a definable well-ordering of the universe (since there is such a well-ordering in $V_{\kappa+1}$). But meanwhile, there will be a $\Diamond_{\Ord}$-sequence that works with respect to classes that are definable from that well-ordering and parameters, simply by following the construction given in the theorem.

This leads to several extremely interesting questions, about which I am currently thinking, concerning instances where we can have $\Diamond_\kappa$ for definable classes in $V_\kappa$, even when the full $\Diamond_\kappa$ fails. Stay tuned!

Set-theoretic mereology: the theory of the subset relation is decidable

In this post I’d like to explain a certain aspect of my on-going project with Makoto Kikuchi on set-theoretic mereology, which is set theory, undertaken in the full set-theoretic universe $V$, but using only the inclusion (subset) relation $\newcommand\of{\subseteq}\of$, rather than the element-of relation $\in$.  (See my earlier post, Different models of set theory with the same subset relation) The subset relation is of course a partial order and indeed a lattice order, since any two sets $a$ and $b$ have a least upper bound, the union $\newcommand\union{\cup}a\union b$, and a greatest lower bound, the intersection $\newcommand\intersect{\cap}a\intersect b$. Furthermore, the lattice has a least element $\emptyset$, but no greatest element, and for any set $a$, the collection of sets with $\emptyset\of b\of a$ forms an atomic Boolean algebra.Venn_Diagram_of_sets_((P),(Q),(R))To assist with the analysis, let’s work a bit more generally. Recall that a partial order is a lattice order, if any two elements have a least upper bound (join) and greatest lower bound (meet), which I shall denote by $a\union b$ and $a\intersect b$, respectively, since I am interested in the set-theoretic cases; similarly, I’ll denote the order by $a\of b$. Let me define that a lattice is locally Boolean, if there is a least element $0$, and for every $a$, the interval $[0,a]$ is a Boolean algebra. Such a lattice is unbounded, if there is no maximal element. In such a lattice, an atom is a non-zero element that is minimal above $0$, and the lattice is atomic, if every element is the least upper bound of the atoms below it. For each natural number $n$, let us introduce the unary predicate denoted $|x|\geq n$, which expresses that $x$ admits a decomposition as the join of $n$ distinct nonzero incompatible elements: $x=y_1\union\cdots\union y_n$, where $y_i\neq 0$ and $y_i\intersect y_j=0$ for $i\neq j$. In an atomic locally Boolean lattice, the relation $|x|\geq n$ holds just in case there are at least $n$ atoms $a\leq x$.

To give a few examples, if $\newcommand\HF{\text{HF}}\HF$ is the set of hereditarily finite sets, then $\langle\HF,\of\rangle$, using the usual subset relation, is an unbounded atomic locally Boolean lattice. More generally, if $V$ is any model of set theory (even a very weak theory is sufficient), then $\langle V,\of\rangle$ is an unbounded atomic locally Boolean lattice.

I should like to prove here that the theory of unbounded atomic locally Boolean lattice orders is decidable, and furthermore admits elimination of quantifiers down to the language including the Boolean operations and the relations expressing the height or size of an object, $|x|\geq n$ and $|x|=n$.

Theorem. Every formula in the language of lattices is equivalent, over the theory of unbounded atomic locally Boolean lattices, to a quantifier-free formula in the language of the order $a\of b$, equality $a=b$, meet $a\intersect b$, join $a\union b$, relative complement $a-b$, constant $0$, the unary relation $|x|\geq n$, and the unary relation $|x|=n$, where $n$ is respectively any natural number.

Proof. We prove the result by induction on formulas. The collection of formulas equivalent to a quantifier-free formula in that language clearly includes all atomic formulas and is closed under Boolean combinations. So it suffices to eliminate the quantifier in a formula of the form $\exists x\, \varphi(x,\ldots)$, where $\varphi(x,\ldots)$ is quantifier-free in that language. Let us make a number of observations that will enable various simplifying assumptions about the form of $\varphi$.

Because equality of terms is expressible by the identity $a=b\iff a\of b\of a$, we do not actually need $=$ in the language (and here I refer to the use of equality in atomic formulas of the form $s=t$ where $s$ and $t$ are terms, and not to the incidental appearance of the symbol $=$ in the unary predicate $|x|=n$, which is an unrelated use of this symbol, a mere stylistic flourish). Similarly, in light of the equivalence $a\of b\iff |a-b|=0$, we do not need to make explicit reference to the order $a\of b$. So we may assume that all atomic assertions in $\varphi$ have the form $|t|\geq n$ or $|t|=n$ for some term $t$ in the language of meet, join, relative complement and $0$. We may omit the need for explicit negation in the formula by systematically applying the equivalences:
$$\neg(|t|\geq n)\iff \bigvee_{k<n}|t|=k\quad\text{ and}$$
$$\neg(|t|=n)\iff (|t|\geq n+1)\vee\bigvee_{k<n}|t|=k.$$
So we have reduced to the case where $\varphi$ is a positive Boolean combination of expressions of the form $|t|\geq n$ and $|t|=n$.

Let us consider the form of the terms $t$ that may arise in the formula. List all the variables $x=x_0,x_1,\ldots,x_N$ that arise in any of the terms appearing in $\varphi$, and consider the Venn diagram corresponding to these variables. The cells of this Venn diagram can each be described by a term of the form $\bigwedge_{i\leq N} \pm x_i$, which I shall refer to as a cell term, where $\pm x_i$ means that either $x_i$ appears or else we have subtracted $x_i$ from the other variables. Since we have only relative complements in a locally Boolean lattice, however, and not absolute complements, we need only consider the cells where at least one variable appears positively, since the exterior region in the Venn diagram is not actually represented by any term. In this way, every term in the language of locally Boolean lattices is a finite union of such cell terms, plus $\emptyset$ (which I suppose can be viewed as an empty union). Note that distinct cell terms are definitely representing disjoint objects in the lattice.

Next, by considering the possible sizes of $s-t$, $s\intersect t$ and $t-s$, observe that
$$|s\union t|\geq n\iff \bigvee_{i+j+k=n}(|s|\geq i+j)\wedge(|s\intersect t|\geq j)\wedge(|t|\geq j+k).$$
Through repeated application of this, we may reduce any assertion about $|t|$ for a term to a Boolean combination of assertions about cell terms. (Note that size assertions about $\emptyset$ are trivially settled by the theory and can be eliminated.)

Let us now focus on the quantified variable $x$ separately from the other variables, for it may appear either positively or negatively in such a cell term. More precisely, each cell term in the variables $x=x_0,x_1,\ldots,x_N$ is equivalent to $x\intersect c$ or $c-x$, for some cell term $c$ in the variables $x_1,\ldots,x_N$, that is, not including $x$, or to the term $x-(x_1\union\cdots\union x_N)$, which is the cell term for which $x$ is the only positive variable.

We have reduced the problem to the case where we want to eliminate the quantifier from $\exists x\, \varphi$, where $\varphi$ is a positive Boolean combination of size assertions about cell terms. We may express $\varphi$ in disjunctive normal form and then distribute the quantifier over the disjunct to reduce to the case where $\varphi$ is a conjunction of size assertions about cell terms. Each cell term has the form $x\intersect c$ or $c-x$ or $x-(x_1\union\cdots x_N)$, where $c$ is a cell term in the list of variables without $x$. Group the conjuncts of $\varphi$ that use the same cell term $c$ in this way together. The point now is that assertions about whether there is an object $x$ in the lattice such that certain cell terms obey various size requirements amount to the conjunction of various size requirements about cells in the variables not including $x$. For example, the assertion $$\exists x\,(|x\intersect c|\geq 3)\wedge(|x\intersect c|\geq 7)\wedge(|c-x|=2)$$ is equivalent (over the theory of unbounded atomic locally Boolean lattices) to the assertion $|c|\geq 9$, since we may simply let $x$ be all but $2$ atoms of $c$, and this will have size at least $7$, which is also at least $3$. If contradictory assertions are made, such as $\exists x\, (|x\intersect c|\geq 5\wedge |x\intersect c|=3)$, then the whole formula is equivalent to $\perp$, which can be expressed without quantifiers as $0\neq 0$.

Next, the key observation of the proof is that assertions about the existence of such $x$ for different cell terms in the variables not including $x$ will succeed or fail independently, since those cell terms are representing disjoint elements of the lattice, and so one may take the final witnessing $x$ to be the union of the witnesses for each piece. So to eliminate the quantifier, we simply group together the atomic assertions being made about the cell terms in the variables without $x$, and then express the existence assertion as a size requirement on those cell terms. For example, the assertion $$\exists x\, (|c\intersect x|\geq 5)\wedge(|c-x|=6)\wedge (|d\intersect x|\geq 7),$$ where $c$ and $d$ are distinct cell terms, is equivalent to $$(|c|\geq 11)\wedge(|d|\geq 7),$$ since $c$ and $d$ are disjoint and so we may let $x$ be the appropriate part of $c$ and a suitable piece of $d$. The only remaining complication concerns instances of the term $x-(x_1\union\cdots\union x_N)$. But for these, the thing to notice is that any single positive size assertion about this term is realizable in our theory, since we have assumed that the lattice is unbounded, and so there will always be as many atoms as desired disjoint from any finite list of objects. But we must again pay attention to whether the requirements expressed by distinct clauses are contradictory.

Altogether, I have provided a procedure for eliminating quantifiers from any assertion in the language of locally Boolean lattices, down to the language augmented by unary predicates expressing the size of an object. This procedure works in any unbounded atomic locally Boolean lattice, and so the theorem is proved. QED

Corollary. The theory of unbounded atomic locally Boolean lattices is complete.

Proof. Every sentence in this theory is equivalent by the procedure to a quantifier-free sentence in the stated language. But since such sentences have no variables, they must simply be a Boolean combination of trivial size assertions about $0$, such as $(|0|\geq 2)\vee \neg(|0|=5)$, whose truth value is settled by the theory. QED

Corollary. The structure of hereditarily finite sets $\langle\HF,\of\rangle$ is an elementary substructure of the entire set-theoretic universe $\langle V,\of\rangle$, with the inclusion relation.

Proof. These structures are both unbounded atomic locally Boolean lattices, and so they each support the quantifier-elimination procedure. But they agree on the truth of any quantifier-free assertion about the sizes of hereditarily finite sets, and so they they must agree on all truth assertions about objects in $\HF$. QED

Corollary. The structure $\langle V,\of\rangle$ has a decidable theory. The structure $\langle\HF,\of\rangle$ has a decidable elementary diagram, and hence a computably decidable presentation.

Proof. The theory is the theory of unbounded atomic locally Boolean lattices. Since the structure $\langle\HF,\of\rangle$ has a computable presentation via the Ackerman coding of hereditarily finite sets, for which the subset relation and the size relations are computable, it follows that we may also compute the truth of any formula by first reducing it to a quantifier-free assertions of those types. So this is a computably decidable presentation. QED

Being HOD-of-a-set is invariant throughout the generic multiverse

Iowa State Capitol - Law Library _ Flickr - Photo Sharing!$\newcommand\HOD{\text{HOD}}$The axiom $V=\HOD$, introduced by Gödel, asserts that every set is ordinal definable. This axiom has a subtler foundational aspect than might at first be expected. The reason is that the general concept of “object $x$ is definable using parameter $p$” is not in general first-order expressible in set theory; it is of course a second-order property, which makes sense only relative to a truth predicate, and by Tarski’s theorem, we can have no first-order definable truth predicate. Thus, the phrase “definable using ordinal parameters” is not directly meaningful in the first-order language of set theory without further qualification or explanation. Fortunately, however, it is a remarkable fact that when we allow definitions to use arbitrary ordinal parameters, as we do with $\HOD$, then we can in fact make such qualifications in such a way that the axiom becomes first-order expressible in set theory. Specifically, we say officially that $V=\HOD$ holds, if for every set $x$, there is an ordinal $\theta$ with $x\in V_\theta$, for which which $x$ is definable by some formula $\psi(x)$ in the structure $\langle V_\theta,{\in}\rangle$ using ordinal parameters. Since $V_\theta$ is a set, we may freely make reference to first-order truth in $V_\theta$ without requiring any truth predicate in $V$. Certainly any such $x$ as this is also ordinal-definable in $V$, since we may use $\theta$ and the Gödel-code of $\psi$ also as parameters, and note that $x$ is the unique object such that it is in $V_\theta$ and satisfies $\psi$ in $V_\theta$. (Note that inside an $\omega$-nonstandard model of set theory, we may really need to use $\psi$ as a parameter, since it may be nonstandard, and $x$ may not be definable in $V_\theta$ using a meta-theoretically standard natural number; but fortunately, the Gödel code of a formula is an integer, which is still an ordinal, and this issue is the key to the issue.) Conversely, if $x$ is definable in $V$ using formula $\varphi(x,\vec\alpha)$ with ordinal parameters $\vec\alpha$, then it follows by the reflection theorem that $x$ is defined by $\varphi(x,\vec\alpha)$ inside some $V_\theta$. So this formulation of $V=HOD$ is expressible and exactly captures the desired second-order property that every set is ordinal-definable.

Consider next the axiom $V=\HOD(b)$, asserting that every set is definable from ordinal parameters and parameter $b$. Officially, as before, $V=\HOD(b)$ asserts that for every $x$, there is an ordinal $\theta$, formula $\psi$ and ordinals $\vec \alpha<\theta$, such that $x$ is the unique object in $V_\theta$ for which $\langle V_\theta,{\in}\rangle\models\psi(x,\vec\alpha,b)$, and the reflection argument shows again that this way of defining the axiom exactly captures the intended idea.

The axiom I actually want to focus on is $\exists b\,\left( V=\HOD(b)\right)$, asserting that the universe is $\HOD$ of a set. (I assume ZFC in the background theory.) It turns out that this axiom is constant throughout the generic multiverse.

Theorem. The assertion $\exists b\, (V=\HOD(b))$ is forcing invariant.

  • If it holds in $V$, then it continues to hold in every set forcing extension of $V$.
  • If it holds in $V$, then it holds in every ground of $V$.

Thus, the truth of this axiom is invariant throughout the generic multiverse.

Proof. Suppose that $\text{ZFC}+V=\HOD(b)$, and $V[G]$ is a forcing extension of $V$ by generic filter $G\subset\mathbb{P}\in V$. By the ground-model definability theorem, it follows that $V$ is definable in $V[G]$ from parameter $P(\mathbb{P})^V$. Thus, using this parameter, as well as $b$ and additional ordinal parameters, we can define in $V[G]$ any particular object in $V$. Since this includes all the $\mathbb{P}$-names used to form $V[G]$, it follows that $V[G]=\HOD(b,P(\mathbb{P})^V,G)$, and so $V[G]$ is $\HOD$ of a set, as desired.

Conversely, suppose that $W$ is a ground of $V$, so that $V=W[G]$ for some $W$-generic filter $G\subset\mathbb{P}\in W$, and $V=\HOD(b)$ for some set $b$. Let $\dot b$ be a name for which $\dot b_G=b$. Every object $x\in W$ is definable in $W[G]$ from $b$ and ordinal parameters $\vec\alpha$, so there is some formula $\psi$ for which $x$ is unique such that $\psi(x,b,\vec\alpha)$. Thus, there is some condition $p\in\mathbb{P}$ such that $x$ is unique such that $p\Vdash\psi(\check x,\dot b,\check{\vec\alpha})$. If $\langle p_\beta\mid\beta<|\mathbb{P}|\rangle$ is a fixed enumeration of $\mathbb{P}$ in $W$, then $p=p_\beta$ for some ordinal $\beta$, and we may therefore define $x$ in $W$ using ordinal parameters, along with $\dot b$ and the fixed enumeration of $\mathbb{P}$. So $W$ thinks the universe is $\HOD$ of a set, as desired.

Since the generic multiverse is obtained by iteratively moving to forcing extensions to grounds, and each such movement preserves the axiom, it follows that $\exists b\, (V=\HOD(b))$ is constant throughout the generic multiverse. QED

Theorem. If $V=\HOD(b)$, then there is a forcing extension $V[G]$ in which $V=\HOD$ holds.

Proof. We are working in ZFC. Suppose that $V=\HOD(b)$. We may assume $b$ is a set of ordinals, since such sets can code any given set. Consider the following forcing iteration: first add a Cohen real $c$, and then perform forcing $G$ that codes $c$, $P(\omega)^V$ and $b$ into the GCH pattern at uncountable cardinals, and then perform self-encoding forcing $H$ above that coding, coding also $G$ (see my paper on Set-theoretic geology for further details on self-encoding forcing). In the final model $V[c][G][H]$, therefore, the objects $c$, $b$, $P(\omega)^V$, $G$ and $H$ are all definable without parameters. Since $V\subset V[c][G][H]$ has a closure point at $\omega$, it satisfies the $\omega_1$-approximation and cover properties, and therefore the class $V$ is definable in $V[c][G][H]$ using $P(\omega)^V$ as a parameter. Since this parameter is itself definable without parameters, it follows that $V$ is parameter-free definable in $V[c][G][H]$. Since $b$ is also definable there, it follows that every element of $\HOD(b)^V=V$ is ordinal-definable in $V[c][G][H]$. And since $c$, $G$ and $H$ are also definable without parameters, we have $V[c][G][H]\models V=\HOD$, as desired. QED

Corollary. The following are equivalent.

  1. The universe is $\HOD$ of a set: $\exists b\, (V=\HOD(b))$.
  2. Somewhere in the generic multiverse, the universe is $\HOD$ of a set.
  3. Somewhere in the generic multiverse, the axiom $V=\HOD$ holds.
  4. The axiom $V=\HOD$ is forceable.

Proof. This is an immediate consequence of the previous theorems. $1\to 4\to 3\to 2\to 1$. QED

Corollary. The axiom $V=\HOD$, if true, even if true anywhere in the generic multiverse, is a switch.

Proof. A switch is a statement such that both it and its negation are necessarily possible by forcing; that is, in every set forcing extension, one can force the statement to be true and also force it to be false. We can always force $V=\HOD$ to fail, simply by adding a Cohen real. If $V=\HOD$ is true, then by the first theorem, every forcing extension has $V=\HOD(b)$ for some $b$, in which case $V=\HOD$ remains forceable, by the second theorem. QED

Different models of set theory with the same subset relation

OkonomiyakiRecently Makoto Kikuchi (Kobe University) asked me the following interesting question, which arises very naturally if one should adopt a mereological perspective in the foundations of mathematics, placing a focus on the parthood relation rather than the element-of relation. In set theory, this perspective would lead one to view the subset or inclusion relation $\subseteq$ as the primary fundamental relation, rather than the membership $\in$ relation.

Question. Can there be two different models of set theory, with the same inclusion relation?

We spent an evening discussing it, over delicious (Rokko-michi-style) okonomiyaki and bi-ru, just like old times, except that we are in Tokyo at the CTFM 2015, and I’d like to explain the answer, which is yes, this always happens in every model of set theory.

Theorem. In any universe of set theory $\langle V,\in\rangle$, there is a definable relation $\in^*$, different from $\in$, such that $\langle V,\in^*\rangle$ is a model of set theory, in fact isomorphic to the original universe $\langle V,\in\rangle$, for which the corresponding inclusion relation $$u\subseteq^* v\iff \forall a\, (a\in^* u\to a\in^* v)$$ is identical to the usual inclusion relation $u\subseteq v$.

Proof. Let $\theta:V\to V$ be any definable non-identity permutation of the universe, and let $\tau:u\mapsto \theta[u]=\{\ \theta(a)\mid a\in u\ \}$ be the function determined by pointwise image under $\theta$. Since $\theta$ is bijective, it follows that $\tau$ is also a bijection of $V$ to $V$, since every set is the $\theta$-image of a unique set. Furthermore, $\tau$ is an automorphism of $\langle V,\subseteq\rangle$, since $$u\subseteq v\iff\theta[u]\subseteq\theta[v]\iff\tau(u) \subseteq\tau(v).$$ I had used this idea a few years ago in my answer to the MathOverflow question, Is the inclusion version of Kunen inconsistency theorem true?, which shows that there are nontrivial $\subseteq$ automorphisms of the universe. Note that since $\tau(\{a\})=\{\theta(a)\}$, it follows that any instance of nontriviality $\theta(a)\neq a$ in $\theta$ leads immediately to an instance of nontriviality in $\tau$.

Using the map $\tau$, define $a\in^* b\iff\tau(a)\in\tau(b)$. By definition, therefore, $\tau$ is an isomorphism of $\langle V,\in^*\rangle\cong\langle V,\in\rangle$. Let us show that $\in^*\neq \in$. Since $\theta$ is nontrivial, there is an $\in$-minimal set $a$ with $\theta(a)\neq a$. By minimality, $\theta[a]=a$ and so $\tau(a)=a$. But as mentioned, $\tau(\{a\})=\{\theta(a)\}\neq\{a\}$. So we have $a\in\{a\}$, but $\tau(a)=a\notin\{\theta(a)\}=\tau(\{a\})$ and hence $a\notin^*\{a\}$. So the two relations are different.

Meanwhile, consider the corresponding subset relation. Specifically, $u\subseteq^* v$ is defined to mean $\forall a\,(a\in^* u\to a\in^* v)$, which holds if and only if $\forall a\, (\tau(a)\in\tau(u)\to \tau(a)\in\tau(v))$; but since $\tau$ is surjective, this holds if and only if $\tau(u)\subseteq \tau(v)$, which as we observed at the beginning of the proof, holds if and only if $u\subseteq v$. So the corresponding subset relations $\subseteq^*$ and $\subseteq$ are identical, as desired.

Another way to express what is going on is that $\tau$ is an isomorphism of the structure $\langle V,{\in^*},{\subseteq}\rangle$ with $\langle V,{\in},{\subseteq}\rangle$, and so $\subseteq$ is in fact that same as the corresponding inclusion relation $\subseteq^*$ that one would define from $\in^*$. QED

Corollary. One cannot define $\in$ from $\subseteq$ in a model of set theory.

Proof. The map $\tau$ is a $\subseteq$-automorphism, and so it preserves every relation definable from $\subseteq$, but it does not preserve $\in$. QED

Nevertheless, I claim that the isomorphism type of $\langle V,\in\rangle$ is implicit in the inclusion relation $\subseteq$, in the sense that any other class relation $\in^*$ having that same inclusion relation is isomorphic to the $\in$ relation.

Theorem. Assume ZFC in the universe $\langle V,\in\rangle$. Suppose that $\in^*$ is a class relation for which $\langle V,\in^*\rangle$ is a model of set theory (a weak set theory suffices), such that the corresponding inclusion relation $$u\subseteq^* v\iff\forall a\,(a\in^* u\to a\in^* v)$$is the same as the usual inclusion relation $u\subseteq v$. Then the two membership relations are isomorphic $$\langle V,\in\rangle\cong\langle V,\in^*\rangle.$$

Proof. Since the singleton set $\{a\}$ has exactly two subsets with respect to the usual $\subseteq$ relation — the empty set and itself — this must also be true with respect to the inclusion relation $\subseteq^*$ defined via $\in^*$, since we have assumed $\subseteq^*=\subseteq$. Thus, the object $\{a\}$ is also a singleton with respect to $\in^*$, and so there is a unique object $\eta(a)$ such that $x\in^* a\iff x=\eta(a)$. By extensionality and since every object has its singleton, it follows that $\eta:V\to V$ is both one-to-one and onto. Let $\theta=\eta^{-1}$ be the inverse permutation.

Observe that $a\in u\iff \{a\}\subseteq u\iff \{a\}\subseteq^* u\iff\eta(a)\in^* u$. Thus, $$b\in^* u\iff \theta(b)\in u.$$

Using $\in$-recursion, define $b^*=\{\ \theta(a^*)\mid a\in b\ \}$. The map $b\mapsto b^*$ is one-to-one by $\in$-recursion, since if there is no violation of this for the elements of $b$, then we may recover $b$ from $b^*$ by applying $\theta^{-1}$ to the elements of $b^*$ and then using the induction assumption to find the unique $a$ from $a^*$ for each $\theta(a^*)\in b^*$, thereby recovering $b$. So $b\mapsto b^*$ is injective.

I claim that this map is also surjective. If $y_0\neq b^*$ for any $b$, then there must be an element of $y_0$ that is not of the form $\theta(b^*)$ for any $b$. Since $\theta$ is surjective, this means there is $\theta(y_1)\in y_0$ with $y_1\neq b^*$ for any $b$. Continuing, there is $y_{n+1}$ with $\theta(y_{n+1})\in y_n$ and $y_{n+1}\neq b^*$ for any $b$. Let $z=\{\ \theta(y_n)\mid n\in\omega\ \}$. Since $x\in^* u\iff \theta(x)\in u$, it follows that the $\in^*$-elements of $z$ are precisely the $y_n$’s. But $\theta(y_{n+1})\in y_n$, and so $y_{n+1}\in^* y_n$. So $z$ has no $\in^*$-minimal element, violating the axiom of foundation for $\in^*$, a contradiction. So the map $b\mapsto b^*$ is a bijection of $V$ with $V$.

Finally, we observe that because $$a\in b\iff\theta(a^*)\in b^*\iff a^*\in^* b^*,$$ it follows that the map $b\mapsto b^*$ is an isomorphism of $\langle V,\in\rangle$ with $\langle V,\in^*\rangle$, as desired. QED

The conclusion is that although $\in$ is not definable from $\subseteq$, nevertheless, the isomorphism type of $\in$ is implicit in $\subseteq$, in the sense that any other class relation $\in^*$ giving rise to the same inclusion relation $\subseteq^*=\subseteq$ is isomorphic to $\in$.

Meanwhile, I do not yet know what the situation is when one drops the assumption that $\in^*$ is a class with respect to the $\langle V,\in\rangle$ universe.

Question. Can there be two models of set theory $\langle M,\in\rangle$ and $\langle M,\in^*\rangle$, not necessarily classes with respect to each other, which have the same inclusion relation $\subseteq=\subseteq^*$, but which are not isomorphic?

(This question is now answered! See my joint paper with Kikuchi at Set-theoretic mereology.)

Upward countable closure in the generic multiverse of forcing to add a Cohen real

I’d like to discuss my theorem that the collection of models $M[c]$ obtained by adding an $M$-generic Cohen real $c$ over a fixed countable transitive model of set theory $M$ is upwardly countably closed, in the sense that every increasing countable chain has an upper bound.

I proved this theorem back in 2011, while at the Young Set Theory Workshop in Bonn and continuing at the London summer school on set theory, in a series of conversations with Giorgio Venturi. The argument has recently come up again in various discussions, and so let me give an account of it.

We consider the collection of all forcing extensions of a fixed countable transitive model $M$ of ZFC by the forcing to add a Cohen real, models of the form $M[c]$, and consider the question of whether every countable increasing chain of these models has an upper bound. The answer is yes!  (Actually, Giorgio wants to undertake forcing constructions by forcing over this collection of models to add a generic upward directed system of models; it follows from this theorem that this forcing is countably closed.) This theorem fits into the theme of my earlier post, Upward closure in the toy multiverse of all countable models of set theory, where similar theorems are proved, but not this one exactly.

Theorem. For any countable transitive model $M\models\text{ZFC}$, the collection of all forcing extensions $M[c]$ by adding an $M$-generic Cohen real is upward-countably closed. That is, for any countable tower of such forcing extensions
$$M[c_0]\subset M[c_1]\subset\cdots\subset M[c_n]\subset\cdots,$$
we may find an $M$-generic Cohen real $d$ such that $M[c_n]\subset M[d]$ for every natural number $n$.

Proof. $\newcommand\Add{\text{Add}}$Suppose that we have such a tower of forcing extensions $M[c_0]\subset M[c_1]\subset\cdots$, and so on. Note that if $M[b]\subset M[c]$ for $M$-generic Cohen reals $b$ and $c$, then $M[c]$ is a forcing extension of $M[b]$ by a quotient of the Cohen-real forcing. But since the Cohen forcing itself has a countable dense set, it follows that all such quotients also have a countable dense set, and so $M[c]$ is actually $M[b][b_1]$ for some $M[b]$-generic Cohen real $b_1$. Thus, we may view the tower as having the form:
$$M[b_0]\subset M[b_0\times b_1]\subset\cdots\subset M[b_0\times b_1\times\cdots\times b_n]\subset\cdots,$$
where now it follows that any finite collection of the reals $b_i$ are mutually $M$-generic.

Of course, we cannot expect in general that the real $\langle b_n\mid n<\omega\rangle$ is $M$-generic for $\Add(\omega,\omega)$, since this real may be very badly behaved. For example, the sequence of first-bits of the $b_n$’s may code a very naughty real $z$, which cannot be added by forcing over $M$ at all. So in general, we cannot allow that this sequence is added to the limit model $M[d]$. (See further discussion in my post Upward closure in the toy multiverse of all countable models of set theory.)

We shall instead undertake a construction by making finitely many changes to each real $b_n$, resulting in a real $d_n$, in such a way that the resulting combined real $d=\oplus_n d_n$ is $M$-generic for the forcing to add $\omega$-many Cohen reals, which is of course isomorphic to adding just one. To do this, let’s get a little more clear with our notation. We regard each $b_n$ as an element of Cantor space $2^\omega$, that is, an infinite binary sequence, and the corresponding filter associated with this real is the collection of finite initial segments of $b_n$, which will be an $M$-generic filter through the partial order of finite binary sequences $2^{<\omega}$, which is one of the standard isomorphic copies of Cohen forcing. We will think of $d$ as a binary function on the plane $d:\omega\times\omega\to 2$, where the $n^{th}$ slice $d_n$ is the corresponding function $\omega\to 2$ obtained by fixing the first coordinate to be $n$.

Now, we enumerate the countably many open dense subsets for the forcing to add a Cohen real $\omega\times\omega\to 2$ as $D_0$, $D_1$, and so on. There are only countably many such dense sets, because $M$ is countable. Now, we construct $d$ in stages. Before stage $n$, we will have completely specified $d_k$ for $k<n$, and we also may be committed to a finite condition $p_{n-1}$ in the forcing to add $\omega$ many Cohen reals. We consider the dense set $D_n$. We may factor $\Add(\omega,\omega)$ as $\Add(\omega,n)\times\Add(\omega,[n,\omega))$. Since $d_0\times\cdots\times d_{n-1}$ is actually $M$-generic (since these are finite modifications of the corresponding $b_k$’s, which are mutually $M$-generic, it follows that there is some finite extension of our condition $p_{n-1}$ to a condition $p_n\in D_n$, which is compatible with $d_0\times\cdots\times d_{n-1}$. Let $d_n$ be the same as $b_n$, except finitely modified to be compatible with $p_n$. In this way, our final real $\oplus_n d_n$ will contain all the conditions $p_n$, and therefore be $M$-generic for $\Add(\omega,\omega)$, yet every $b_n$ will differ only finitely from $d_n$ and hence be an element of $M[d]$. So we have $M[b_0]\cdots[b_n]\subset M[d]$, and we have found our upper bound. QED

Notice that the real $d$ we construct is not only $M$-generic, but also $M[c_n]$-generic for every $n$.

My related post, Upward closure in the toy multiverse of all countable models of set theory, which is based on material in my paper Set-theoretic geology, discusses some similar results.