Four-Vectors and Invariants
By now relativity can feel like a bag of separate tricks: clocks slow down, rulers shrink, velocities
add in a funny way, energy and momentum pick up factors of \gamma. Every
result seems to demand its own special formula, and every quantity you measure depends on who's looking.
There has to be a better way to organise all this — and there is. It is the single most powerful idea in
the whole subject: package quantities into four-vectors, and hunt for the invariants.
An invariant is a number that every observer agrees on, no matter how fast
they move — a fixed point in a world where lengths and times slosh around. Once you learn to build
quantities out of invariants, relativity stops being a minefield of frame-dependent corrections and
becomes almost easy: you compute the invariant in whichever frame is simplest, and its value is then
good in all frames. This page shows how to spot and use them, using the compact bookkeeping of
index notation
and the Einstein summation convention.
The position four-vector
In ordinary space a point needs three numbers, (x, y, z), bundled into a
vector \vec r. In spacetime an event needs four — a time and a place —
so we bundle them into a four-vector. To keep the units consistent we again use
ct for the time slot:
x^\mu = (x^0, x^1, x^2, x^3) = (ct,\ x,\ y,\ z).
The Greek index \mu (mu) runs over 0, 1, 2, 3 — the
0 component is time, the other three are space. (Latin indices
i, j, k are reserved for just the space parts 1, 2, 3.)
The superscript is an index, not a power — x^2 here means "the second
component", the y-coordinate, not "x squared".
Writing it this way, the Lorentz
transformation becomes a single tidy matrix acting on x^\mu, the
exact four-dimensional analogue of rotating a vector.
The Minkowski metric and the invariant interval
In ordinary space the length of a vector comes from the Pythagorean dot product,
\vec r \cdot \vec r = x^2 + y^2 + z^2, and that length doesn't change when you
rotate your axes — it's a rotational invariant. Spacetime has its own "length", but with a crucial twist:
the time part enters with the opposite sign. The recipe for combining components is
stored in the Minkowski metric \eta_{\mu\nu} (eta), a
4\times 4 array:
\eta_{\mu\nu} = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0 \\ 0 & 0 & -1 & 0 \\ 0 & 0 & 0 & -1 \end{pmatrix}.
The metric tells you how to take the "dot product" of two four-vectors. Using the
Einstein summation convention — a repeated index, once up and once down, is
automatically summed over 0,1,2,3 — the spacetime length-squared of
x^\mu is
s^2 = \eta_{\mu\nu}\,x^\mu x^\nu = (x^0)^2 - (x^1)^2 - (x^2)^2 - (x^3)^2 = (ct)^2 - x^2 - y^2 - z^2.
This number s^2 is the invariant interval, and its magic is
that every observer computes the same value, even though they disagree about
t and about x separately. Time dilation and length
contraction are precisely the trade-off that keeps this combination fixed: when one observer's
t stretches, their x shifts by just enough to leave
(ct)^2 - x^2 untouched. It is the spacetime version of the fact that rotating
a rod changes its shadow on the wall but not its true length.
- s^2 = \eta_{\mu\nu}\,x^\mu x^\nu = (ct)^2 - x^2 - y^2 - z^2 is the same in every inertial frame.
- s^2 > 0: timelike separation — a possible cause-and-effect link; \sqrt{s^2}/c is the proper time a clock would tick between the events.
- s^2 < 0: spacelike — no causal link; \sqrt{-s^2} is the proper distance. And s^2 = 0: lightlike, connected by a light ray.
Invariance made visible: hyperbolas, not circles
In ordinary geometry, "all points a fixed distance from the origin" trace a circle —
and rotating your axes slides points around that circle without changing their distance. Minkowski
geometry does the same thing, but because of the minus sign the curve of constant interval is a
hyperbola. A Lorentz "boost" (changing to a moving frame) slides events along these
hyperbolas, leaving s^2 fixed. Reveal the figure to see them.
The light cone is the boundary case, the hyperbola's own asymptote, where s^2 = 0.
Timelike events sit inside it (on up/down branches), spacelike events outside it (left/right branches).
This one picture is the geometry of special relativity: replace "circle" with "hyperbola" and
"rotation" with "boost", and all your Euclidean intuition transfers over.
The energy–momentum four-vector
Here's where four-vectors pay off spectacularly. Just as position bundles time and space, we can bundle
energy and
momentum into one four-vector — the four-momentum:
p^\mu = \left(\frac{E}{c},\ p_x,\ p_y,\ p_z\right).
Energy sits in the time slot (over c, to fix the units) and ordinary momentum
fills the three space slots. Now feed it through the very same metric to build its invariant:
\eta_{\mu\nu}\,p^\mu p^\nu = \left(\frac{E}{c}\right)^2 - p_x^2 - p_y^2 - p_z^2 = \frac{E^2}{c^2} - |\vec p|^2 = (mc)^2.
Rearranged, that is exactly the energy–momentum relation E^2 = (pc)^2 + (mc^2)^2
from the last page — but now we see it for what it really is: the length-squared of the
four-momentum, and that length is the mass. Mass is simply the invariant "spacetime magnitude"
of a particle's energy–momentum, the one number about it that no observer can change. Every frame sees a
different E and a different \vec p, but they all
reconstruct the same (mc)^2.
- The four-momentum p^\mu = (E/c,\ \vec p) has invariant squared-length \eta_{\mu\nu}p^\mu p^\nu = (mc)^2.
- Equivalently E^2 - (pc)^2 = (mc^2)^2: the mass is frame-independent even though E and \vec p are not.
- Four-momentum is conserved in interactions — all four components at once — which is why particle physicists track it obsessively.
Why invariants make relativity easy
The strategy is always the same, and it turns hard problems into one-liners:
-
Build the invariant, then evaluate it in the easiest frame. Want a particle's mass?
Compute E^2 - (pc)^2 in the lab frame; it equals (mc^2)^2,
the value in the rest frame where the maths is trivial. No Lorentz transforming required.
-
Conservation laws become four equations for the price of one. "Total four-momentum in
= total four-momentum out" packs energy conservation and all three momentum conservations into a single
vector statement \sum p^\mu_{\text{in}} = \sum p^\mu_{\text{out}}.
-
If a quantity is a four-vector, its invariant is automatically frame-independent. You
get a conserved, agreed-upon number for free, without checking frame by frame.
This is why professionals think in four-vectors. The "special formulas" for time dilation and length
contraction are still true, but you rarely need them: the invariant does the heavy lifting.
Worked examples
Example 1 — proper time from the interval. Two ticks of a spaceship's clock happen at
the same place on the ship, separated by \Delta t = 5\ \text{s} of ship
time, and the ship travels \Delta x = 4 light-seconds in the ground frame during
that time. The invariant interval, computed in the ground frame, is
s^2 = (c\Delta t)^2 - \Delta x^2 = (c \cdot 5)^2 - (4c)^2 = c^2(25 - 16) = 9c^2.
The proper time is \Delta\tau = \sqrt{s^2}/c = 3\ \text{s} — exactly what the
ship's own clock reads, and every observer agrees, because s^2 is invariant.
Example 2 — mass of an unknown particle. A detector measures a particle with total
energy E = 5\ \text{GeV} and momentum pc = 4\ \text{GeV}.
Its invariant mass follows immediately from the four-momentum's length:
mc^2 = \sqrt{E^2 - (pc)^2} = \sqrt{5^2 - 4^2} = \sqrt{9} = 3\ \text{GeV}.
We never needed the particle's speed. Any lab, moving any way, measuring its own
E and p, reconstructs the same
3\ \text{GeV} mass — that's how new particles are identified from the debris of
collisions.
Example 3 — a photon has zero length. For light, E = pc, so
its four-momentum invariant is E^2 - (pc)^2 = 0, giving
m = 0. The photon's four-momentum is a lightlike (null) four-vector — a
nonzero vector with zero length, something impossible in ordinary Euclidean space but perfectly natural once
the metric carries a minus sign.
You can, and some books do: they use the "mostly plus" metric \eta_{\mu\nu} = \text{diag}(-1,+1,+1,+1),
which just flips the overall sign of every interval (timelike becomes negative). What you can
never do is make all four signs the same. The single relative minus sign is the
entire physical content — it's what makes spacetime Lorentzian rather than Euclidean, what
separates "time" from "space", what bends circles into hyperbolas, and what enforces the cosmic speed
limit. Erase the minus and you erase relativity: with four plus signs, "boosts" would be ordinary
rotations, there'd be no light cone, no causal order, and no maximum speed. That one minus sign is doing
the work of the whole theory. (Just pick a convention and stick with it — mixing them mid-calculation is
a classic way to lose a sign.)
The superscript on x^\mu is a label, not an exponent. This
trips up everyone at first. In x^\mu = (ct, x, y, z), the symbol
x^2 means "component number 2" (the y-coordinate),
not "x squared". Two habits keep you safe:
-
When you genuinely square a component, wrap it: write (x^1)^2 for "the
x-component, squared". The interval is
(x^0)^2 - (x^1)^2 - (x^2)^2 - (x^3)^2, parentheses and all.
-
The Einstein summation only fires when an index appears once up and once down
(x^\mu x_\mu), which is why the metric \eta_{\mu\nu}
(two down-indices) is there to "lower" one of them. Two up indices
x^\mu x^\mu is not a valid contraction — you need the metric in between.