First-Order Arithmetic and the Peano Axioms
Suppose you wanted to write down the rules of ordinary counting so precisely that a
machine — with no idea what a number "really" is — could check every step of every proof about
the whole numbers. No pictures, no hand-waving, no "you know what I mean": just symbols and
rules. That is exactly what first-order arithmetic tries to do, and its most
famous version is called Peano Arithmetic, or PA.
The astonishing plot twist — the thing that makes this one of the great stories of twentieth-century
mathematics — is that it doesn't quite work. The rules are honest, natural, and as tight
as you could reasonably ask for, and yet they fail to capture only the numbers you
meant. Hiding among the models of PA are strange worlds full of "infinite" numbers, and there are
true facts about 0, 1, 2, \dots that PA can never prove. This page is
about how you write the rules down, what you can build from them, and where they slip through your
fingers.
The signature: what we get to talk about
Before you can state any axioms you must fix a signature (or language) — the exact
list of symbols allowed in your sentences. For arithmetic it is small and familiar:
- a constant 0 — a name for zero;
-
a one-place function S — the successor,
"the next number after". So S(0) is our official name for
1, S(S(0)) is 2,
and in general S^n(0) is the numeral for
n;
- two two-place functions + and \cdot — addition and multiplication;
- a two-place relation < — the usual ordering.
Everything PA says must be built from these symbols, plus logic: variables
x, y, z, \dots, the connectives
\wedge, \vee, \neg, \Rightarrow, equality =,
and the first-order quantifiers \forall x ("for all
numbers x") and \exists x ("there is a number
x"). The phrase "first-order" is the crucial fine print: we may quantify
over numbers, but never over sets of numbers or properties. Hold on to
that restriction — it is the whole drama of the page.
The Peano axioms
Now we state the rules. The first pair pin down the successor S, making
it behave like "add one, starting a fresh chain":
-
Zero is nobody's successor:
\forall x\; \big(\, S(x) \neq 0 \,\big).
Counting up never loops back to 0 — there is a genuine starting point.
-
Successor is injective:
\forall x\,\forall y\; \big(\, S(x) = S(y) \;\Rightarrow\; x = y \,\big).
Different numbers have different "next" numbers, so the chain never merges.
The next axioms are the recursive defining equations for
+ and \cdot. Each operation gets a
base case (what happens against 0) and a step case
(how it passes through a successor):
-
Addition.
x + 0 = x, \qquad x + S(y) = S(x + y).
"Adding zero changes nothing; adding one-more-than-y is one-more-than
(adding y)."
-
Multiplication.
x \cdot 0 = 0, \qquad x \cdot S(y) = (x \cdot y) + x.
"Times zero is zero; times one-more-than-y is (times
y) plus one more copy of x."
These four equations are not arbitrary — they are exactly the clauses of a
recursive definition. They let
you compute any sum or product by peeling successors off the right-hand argument one at a
time until you hit the base case. There is nothing left to guess.
Worked example: unfolding 2 + 2 = 4
Let us prove something a five-year-old knows, but prove it from the axioms, so a machine
could follow along. Write the numerals in full: 2 = S(S(0)) and
4 = S(S(S(S(0)))). We want
S(S(0)) + S(S(0)) = S(S(S(S(0)))). Peel the successors off the second
argument using the step equation x + S(y) = S(x+y):
\begin{aligned}
S(S(0)) + S(S(0)) \;&=\; S\big(\, S(S(0)) + S(0) \,\big) && \text{step case, peel one } S\\
&=\; S\big(\, S(\, S(S(0)) + 0 \,) \,\big) && \text{step case again}\\
&=\; S\big(\, S(\, S(S(0)) \,) \,\big) && \text{base case } x+0=x\\
&=\; S(S(S(S(0)))) \;=\; 4.
\end{aligned}
Every line cites one axiom, applied to actual symbols — no appeal to intuition about "two apples
and two apples". That mechanical certainty is the prize we were after: arithmetic reduced to
symbol-pushing a computer could verify.
It looks obvious that no number is its own successor, but the two successor axioms alone do
not give it to you — you need induction (below). Take the
property \varphi(x) to be "S(x) \neq x".
Base: S(0) \neq 0 is precisely the first axiom.
Step: assume S(x) \neq x; if we had
S(S(x)) = S(x), injectivity would force
S(x) = x, contradicting the assumption — so
S(S(x)) \neq S(x). By induction,
S(x) \neq x holds for every x. Notice how
the "obvious" fact quietly needed the biggest axiom of all.
Induction — and why first-order PA needs a whole schema
The soul of arithmetic is mathematical induction: if a property holds of
0, and holds of S(x) whenever it holds of
x, then it holds of every number. Informally we would love to
write it as a single sentence quantifying over all properties \varphi:
\forall \varphi\;\Big[\, \big(\varphi(0) \,\wedge\, \forall x\,(\varphi(x) \Rightarrow \varphi(S(x)))\big)
\;\Rightarrow\; \forall x\,\varphi(x) \,\Big].
But that opening \forall \varphi quantifies over properties, not
numbers — it is second-order, and it is forbidden in a first-order language. So
first-order PA makes a compromise. It replaces the one grand axiom with an
induction schema: an infinite family of first-order axioms, one for each
formula \varphi(x) you can actually write in the language of
arithmetic.
For every formula \varphi(x) of the language of
arithmetic (possibly with extra parameters), PA includes the axiom
\big(\varphi(0) \,\wedge\, \forall x\,(\varphi(x) \Rightarrow \varphi(S(x)))\big)
\;\Rightarrow\; \forall x\,\varphi(x).
This is a real difference, not a technicality. The second-order axiom ranges over
all subsets of the number line — uncountably many properties, including ones no formula
could ever name. The first-order schema only covers the definable properties — countably
many. That gap between "all subsets" and "the ones we can write down" is precisely the crack through
which the strange models below will climb.
Models of PA: the standard model, and the impostors
A model of
PA is any structure — a set of "numbers" with its own 0,
S, +, \cdot,
< — in which every Peano axiom comes out true. The one you have in
mind is the standard model \mathbb{N} = \{0, 1, 2, 3, \dots\}:
a single chain marching off to infinity, every element reachable from 0
by finitely many successors.
Here is the shock. First-order PA has other models — non-standard
ones — that satisfy every one of its axioms yet are not \mathbb{N}.
They contain the entire standard chain 0, S0, SS0, \dots and then, sitting
beyond every numeral, extra elements: "numbers" c larger than
0, 1, 2, 3, \dots all at once. The successor function still works on them,
so each such c drags along its own two-way chain
\dots, c-2, c-1, c, c+1, c+2, \dots that never touches
0.
Why must such impostors exist? Because of the
compactness theorem.
Add to the language a new constant c and throw in the infinitely many
sentences c > 0,\; c > S(0),\; c > S(S(0)),\; \dots — "\(c\) is
bigger than every numeral". Any finite subset of these is satisfiable inside ordinary
\mathbb{N} (just pick c big enough), so by
compactness the whole infinite set has a model. In that model
c is a genuine element of a structure obeying all of PA, yet it is bigger
than 0, 1, 2, \dots — an infinite number living inside arithmetic.
The moral: first-order PA is not categorical. Its axioms cannot single out
\mathbb{N} up to isomorphism — they are outvoted by the non-standard
models they cannot exclude. This is a direct consequence of using the first-order induction
schema instead of the full second-order axiom.
Second-order Peano arithmetic: categorical, at a price
Give yourself the second-order induction axiom — the single one quantifying over all
properties — and the impostors vanish. Dedekind proved it: any two models of the full second-order
Peano axioms are isomorphic. Second-order PA is categorical; it pins down
\mathbb{N} exactly. So why don't we just use that and be done?
Because the power comes at a cost. Second-order logic has no complete, effective proof system: there
is no machine that can enumerate all its logical truths. The first-order schema was the compromise
that kept arithmetic mechanically checkable — the very goal we started with. You can have a
proof system a computer can run, or you can have categoricity, but the two together are out of
reach. First-order PA keeps the machine and pays with the non-standard models.
A common first reaction is to think a non-standard model must be broken — that its "infinite
numbers" secretly break some axiom. They don't. Every Peano axiom, including every instance
of the induction schema, is true in a non-standard model. Induction still holds
there; it just ranges over that model's own definable subsets, which do not include the set "the
standard numbers only" (that set exists outside the model, not inside its language).
So a non-standard model is not a mistake or a paradox — it is a perfectly legitimate structure
that first-order PA is simply unable to tell apart from \mathbb{N}. The
failure is not in the model; it is in the expressive power of a first-order theory.
A glance ahead: Gödel, and truths PA can't reach
The story does not stop at non-standard models. Gödel's
first incompleteness
theorem says that any consistent, effectively axiomatised theory strong enough to do
arithmetic — PA very much included — is incomplete: there are sentences
\sigma in its own language that are true of
\mathbb{N} but that PA can neither prove nor disprove.
These are not artificial logician's tricks, either. The Goodstein sequences — a
game where you keep rewriting a number in fancier and fancier base notation and subtracting one —
always crash down to zero eventually, a genuine theorem about ordinary integers; yet
"every Goodstein sequence terminates" is unprovable in PA. The
Paris–Harrington theorem, a strengthened finite Ramsey statement, is another true
arithmetic fact beyond PA's reach. Both are proved true using tools (transfinite induction up to
\varepsilon_0) that step outside PA — echoing, once again, that the
first-order rules of counting do not catch everything true about the numbers you meant.