Homogeneous Coordinates

A matrix transformation can rotate, scale and shear — but it can never translate, never slide the whole plane sideways. The reason is built into what "linear" means. Homogeneous coordinates are the clever fix: bolt one extra coordinate w onto every vector, and translation, projection and everything else become ordinary matrix multiplication.

The trick: a 2-D point (x, y) is written as the 3-D vector (x, y, 1), and a 2-D direction as (x, y, 0).

Why a matrix can't translate

Step 1 — every linear map fixes the origin. Apply any matrix M to the zero vector and you get the zero vector back:

M\,\vec{0} = \vec{0}.

Step 2 — so plain M\vec{x} can't move the origin. A translation by \vec{t} = (t_x, t_y) must send (0,0) to (t_x, t_y) — a non-zero place. No 2\times 2 matrix can do that. We are stuck.

Step 3 — lift into one higher dimension. Append a coordinate w = 1 to the point. Now the "origin" we feed in is (0, 0, 1) — not the true origin of the bigger space — so a matrix is free to move it. Use a 3\times 3 matrix with the translation tucked into its last column:

\begin{bmatrix} 1 & 0 & t_x \\ 0 & 1 & t_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}.

Step 4 — multiply it out. Row by row:

= \begin{bmatrix} x + t_x \\ y + t_y \\ 1 \end{bmatrix}.

The point landed at (x + t_x,\, y + t_y) — a genuine translation, achieved purely by matrix–vector multiplication. The bottom coordinate stayed 1, so the result is still a point.

Step 5 — a direction (w = 0) is left unmoved. Feed the same matrix a direction vector (d_x, d_y, 0) instead:

\begin{bmatrix} 1 & 0 & t_x \\ 0 & 1 & t_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} d_x \\ d_y \\ 0 \end{bmatrix} = \begin{bmatrix} d_x \\ d_y \\ 0 \end{bmatrix}.

The translation column is multiplied by w = 0, so it contributes nothing. That is exactly right: points have a location to move, but a direction ("two units east") is the same arrow wherever you stand — translating it must do nothing. The single value of w encodes the difference.

Scaling w: one point, many names

Step 6 — the homogeneous equivalence. Multiplying an entire homogeneous vector by any nonzero \lambda names the same point:

(\lambda x,\; \lambda y,\; \lambda w) \;\equiv\; (x,\, y,\, w), \qquad \lambda \neq 0.

Step 7 — recover the Cartesian point by dividing through by w. To read off where a homogeneous point actually sits, divide the spatial coordinates by w:

(x,\, y,\, z,\, w) \;\longmapsto\; \left( \frac{x}{w},\; \frac{y}{w},\; \frac{z}{w} \right).

With w = 1 this is a no-op, which is why we store points with w = 1. But a general w is the secret behind perspective — the vignette below.

Embed \mathbb{R}^n in \mathbb{R}^{n+1} by appending a coordinate w. Then:

A point is (x, y, z, 1) (w = 1); a direction is (x, y, z, 0) (w = 0).
Scaling is free: (\lambda x, \lambda y, \lambda z, \lambda w) \equiv (x, y, z, w) for any \lambda \neq 0.
The Cartesian point is recovered by the perspective divide (x, y, z, w) \mapsto (x/w,\, y/w,\, z/w).
Translation becomes a matrix: put \vec{t} in the last column. It moves points (w = 1) and leaves directions (w = 0) untouched.

So far we have kept w = 1. The real magic happens when a matrix sets w \neq 1. A perspective camera multiplies by a matrix whose bottom row is not (0,0,0,1) but something like (0, 0, -1, 0), so the output w ends up equal to (minus) the depth z:

(x,\, y,\, z,\, 1) \;\longmapsto\; (x,\, y,\, z',\, -z) \;\longmapsto\; \left( -\frac{x}{z},\; -\frac{y}{z},\; \dots \right).

That final divide-by-w divides screen position by depth — so an object twice as far away appears half as big. The perspective divide is literally why distant things look small, and railway tracks meet at the horizon. The whole of it is one extra coordinate, carried along until the very last step.