First-Order Arithmetic and the Peano Axioms

Suppose you wanted to write down the rules of ordinary counting so precisely that a machine — with no idea what a number "really" is — could check every step of every proof about the whole numbers. No pictures, no hand-waving, no "you know what I mean": just symbols and rules. That is exactly what first-order arithmetic tries to do, and its most famous version is called Peano Arithmetic, or PA.

The astonishing plot twist — the thing that makes this one of the great stories of twentieth-century mathematics — is that it doesn't quite work. The rules are honest, natural, and as tight as you could reasonably ask for, and yet they fail to capture only the numbers you meant. Hiding among the models of PA are strange worlds full of "infinite" numbers, and there are true facts about 0, 1, 2, \dots that PA can never prove. This page is about how you write the rules down, what you can build from them, and where they slip through your fingers.

The signature: what we get to talk about

Before you can state any axioms you must fix a signature (or language) — the exact list of symbols allowed in your sentences. For arithmetic it is small and familiar:

a constant 0 — a name for zero;
a one-place function S — the successor, "the next number after". So S(0) is our official name for 1, S(S(0)) is 2, and in general S^n(0) is the numeral for n;
two two-place functions + and \cdot — addition and multiplication;
a two-place relation < — the usual ordering.

Everything PA says must be built from these symbols, plus logic: variables x, y, z, \dots, the connectives \wedge, \vee, \neg, \Rightarrow, equality =, and the first-order quantifiers \forall x ("for all numbers x") and \exists x ("there is a number x"). The phrase "first-order" is the crucial fine print: we may quantify over numbers, but never over sets of numbers or properties. Hold on to that restriction — it is the whole drama of the page.

The Peano axioms

Now we state the rules. The first pair pin down the successor S, making it behave like "add one, starting a fresh chain":

Zero is nobody's successor: \forall x\; \big(\, S(x) \neq 0 \,\big). Counting up never loops back to 0 — there is a genuine starting point.
Successor is injective: \forall x\,\forall y\; \big(\, S(x) = S(y) \;\Rightarrow\; x = y \,\big). Different numbers have different "next" numbers, so the chain never merges.

The next axioms are the recursive defining equations for + and \cdot. Each operation gets a base case (what happens against 0) and a step case (how it passes through a successor):

Addition. x + 0 = x, \qquad x + S(y) = S(x + y). "Adding zero changes nothing; adding one-more-than-y is one-more-than (adding y)."
Multiplication. x \cdot 0 = 0, \qquad x \cdot S(y) = (x \cdot y) + x. "Times zero is zero; times one-more-than-y is (times y) plus one more copy of x."

These four equations are not arbitrary — they are exactly the clauses of a recursive definition. They let you compute any sum or product by peeling successors off the right-hand argument one at a time until you hit the base case. There is nothing left to guess.

Worked example: unfolding 2 + 2 = 4

Let us prove something a five-year-old knows, but prove it from the axioms, so a machine could follow along. Write the numerals in full: 2 = S(S(0)) and 4 = S(S(S(S(0)))). We want S(S(0)) + S(S(0)) = S(S(S(S(0)))). Peel the successors off the second argument using the step equation x + S(y) = S(x+y):

\begin{aligned} S(S(0)) + S(S(0)) \;&=\; S\big(\, S(S(0)) + S(0) \,\big) && \text{step case, peel one } S\\ &=\; S\big(\, S(\, S(S(0)) + 0 \,) \,\big) && \text{step case again}\\ &=\; S\big(\, S(\, S(S(0)) \,) \,\big) && \text{base case } x+0=x\\ &=\; S(S(S(S(0)))) \;=\; 4. \end{aligned}

Every line cites one axiom, applied to actual symbols — no appeal to intuition about "two apples and two apples". That mechanical certainty is the prize we were after: arithmetic reduced to symbol-pushing a computer could verify.

It looks obvious that no number is its own successor, but the two successor axioms alone do not give it to you — you need induction (below). Take the property \varphi(x) to be "S(x) \neq x". Base: S(0) \neq 0 is precisely the first axiom. Step: assume S(x) \neq x; if we had S(S(x)) = S(x), injectivity would force S(x) = x, contradicting the assumption — so S(S(x)) \neq S(x). By induction, S(x) \neq x holds for every x. Notice how the "obvious" fact quietly needed the biggest axiom of all.

Induction — and why first-order PA needs a whole schema

The soul of arithmetic is mathematical induction: if a property holds of 0, and holds of S(x) whenever it holds of x, then it holds of every number. Informally we would love to write it as a single sentence quantifying over all properties \varphi:

\forall \varphi\;\Big[\, \big(\varphi(0) \,\wedge\, \forall x\,(\varphi(x) \Rightarrow \varphi(S(x)))\big) \;\Rightarrow\; \forall x\,\varphi(x) \,\Big].

But that opening \forall \varphi quantifies over properties, not numbers — it is second-order, and it is forbidden in a first-order language. So first-order PA makes a compromise. It replaces the one grand axiom with an induction schema: an infinite family of first-order axioms, one for each formula \varphi(x) you can actually write in the language of arithmetic.

For every formula \varphi(x) of the language of arithmetic (possibly with extra parameters), PA includes the axiom \big(\varphi(0) \,\wedge\, \forall x\,(\varphi(x) \Rightarrow \varphi(S(x)))\big) \;\Rightarrow\; \forall x\,\varphi(x).

This is a real difference, not a technicality. The second-order axiom ranges over all subsets of the number line — uncountably many properties, including ones no formula could ever name. The first-order schema only covers the definable properties — countably many. That gap between "all subsets" and "the ones we can write down" is precisely the crack through which the strange models below will climb.

Models of PA: the standard model, and the impostors

A model of PA is any structure — a set of "numbers" with its own 0, S, +, \cdot, < — in which every Peano axiom comes out true. The one you have in mind is the standard model \mathbb{N} = \{0, 1, 2, 3, \dots\}: a single chain marching off to infinity, every element reachable from 0 by finitely many successors.

Here is the shock. First-order PA has other models — non-standard ones — that satisfy every one of its axioms yet are not \mathbb{N}. They contain the entire standard chain 0, S0, SS0, \dots and then, sitting beyond every numeral, extra elements: "numbers" c larger than 0, 1, 2, 3, \dots all at once. The successor function still works on them, so each such c drags along its own two-way chain \dots, c-2, c-1, c, c+1, c+2, \dots that never touches 0.

Why must such impostors exist? Because of the compactness theorem. Add to the language a new constant c and throw in the infinitely many sentences c > 0,\; c > S(0),\; c > S(S(0)),\; \dots — "\(c\) is bigger than every numeral". Any finite subset of these is satisfiable inside ordinary \mathbb{N} (just pick c big enough), so by compactness the whole infinite set has a model. In that model c is a genuine element of a structure obeying all of PA, yet it is bigger than 0, 1, 2, \dots — an infinite number living inside arithmetic.

The moral: first-order PA is not categorical. Its axioms cannot single out \mathbb{N} up to isomorphism — they are outvoted by the non-standard models they cannot exclude. This is a direct consequence of using the first-order induction schema instead of the full second-order axiom.

Second-order Peano arithmetic: categorical, at a price

Give yourself the second-order induction axiom — the single one quantifying over all properties — and the impostors vanish. Dedekind proved it: any two models of the full second-order Peano axioms are isomorphic. Second-order PA is categorical; it pins down \mathbb{N} exactly. So why don't we just use that and be done?

Because the power comes at a cost. Second-order logic has no complete, effective proof system: there is no machine that can enumerate all its logical truths. The first-order schema was the compromise that kept arithmetic mechanically checkable — the very goal we started with. You can have a proof system a computer can run, or you can have categoricity, but the two together are out of reach. First-order PA keeps the machine and pays with the non-standard models.

A common first reaction is to think a non-standard model must be broken — that its "infinite numbers" secretly break some axiom. They don't. Every Peano axiom, including every instance of the induction schema, is true in a non-standard model. Induction still holds there; it just ranges over that model's own definable subsets, which do not include the set "the standard numbers only" (that set exists outside the model, not inside its language).

So a non-standard model is not a mistake or a paradox — it is a perfectly legitimate structure that first-order PA is simply unable to tell apart from \mathbb{N}. The failure is not in the model; it is in the expressive power of a first-order theory.

A glance ahead: Gödel, and truths PA can't reach

The story does not stop at non-standard models. Gödel's first incompleteness theorem says that any consistent, effectively axiomatised theory strong enough to do arithmetic — PA very much included — is incomplete: there are sentences \sigma in its own language that are true of \mathbb{N} but that PA can neither prove nor disprove.

These are not artificial logician's tricks, either. The Goodstein sequences — a game where you keep rewriting a number in fancier and fancier base notation and subtracting one — always crash down to zero eventually, a genuine theorem about ordinary integers; yet "every Goodstein sequence terminates" is unprovable in PA. The Paris–Harrington theorem, a strengthened finite Ramsey statement, is another true arithmetic fact beyond PA's reach. Both are proved true using tools (transfinite induction up to \varepsilon_0) that step outside PA — echoing, once again, that the first-order rules of counting do not catch everything true about the numbers you meant.