A
matrix transformation
can rotate, scale and shear — but it can never translate, never slide the
whole plane sideways. The reason is built into what "linear" means. Homogeneous
coordinates are the clever fix: bolt one extra coordinate
w onto every vector, and translation, projection and everything
else become ordinary matrix multiplication.
The trick: a 2-D point (x, y) is written as the 3-D vector
(x, y, 1), and a 2-D direction as
(x, y, 0).
Why a matrix can't translate
Step 1 — every linear map fixes the origin. Apply any matrix
M to the zero vector and you get the zero vector back:
M\,\vec{0} = \vec{0}.
Step 2 — so plain M\vec{x} can't move the origin.
A translation by \vec{t} = (t_x, t_y) must send
(0,0) to (t_x, t_y) — a non-zero
place. No 2\times 2 matrix can do that. We are stuck.
Step 3 — lift into one higher dimension. Append a coordinate
w = 1 to the point. Now the "origin" we feed in is
(0, 0, 1) — not the true origin of the bigger space — so a
matrix is free to move it. Use a 3\times 3 matrix with the
translation tucked into its last column:
\begin{bmatrix} 1 & 0 & t_x \\ 0 & 1 & t_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}.
Step 4 — multiply it out. Row by row:
= \begin{bmatrix} x + t_x \\ y + t_y \\ 1 \end{bmatrix}.
The point landed at (x + t_x,\, y + t_y) — a genuine translation,
achieved purely by
matrix–vector multiplication.
The bottom coordinate stayed 1, so the result is still a point.
Step 5 — a direction (w = 0) is left unmoved. Feed
the same matrix a direction vector
(d_x, d_y, 0) instead:
\begin{bmatrix} 1 & 0 & t_x \\ 0 & 1 & t_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} d_x \\ d_y \\ 0 \end{bmatrix} = \begin{bmatrix} d_x \\ d_y \\ 0 \end{bmatrix}.
The translation column is multiplied by w = 0, so it contributes
nothing. That is exactly right: points have a location to move, but a
direction ("two units east") is the same arrow wherever you stand —
translating it must do nothing. The single value of w encodes the
difference.
Scaling w: one point, many names
Step 6 — the homogeneous equivalence. Multiplying an entire homogeneous
vector by any nonzero \lambda names the same point:
(\lambda x,\; \lambda y,\; \lambda w) \;\equiv\; (x,\, y,\, w), \qquad \lambda \neq 0.
Step 7 — recover the Cartesian point by dividing through by
w. To read off where a homogeneous point actually sits,
divide the spatial coordinates by w:
(x,\, y,\, z,\, w) \;\longmapsto\; \left( \frac{x}{w},\; \frac{y}{w},\; \frac{z}{w} \right).
With w = 1 this is a no-op, which is why we store points with
w = 1. But a general w is the
secret behind perspective — the vignette below.
Embed \mathbb{R}^n in
\mathbb{R}^{n+1} by appending a coordinate
w. Then:
-
A point is (x, y, z, 1) (w = 1);
a direction is (x, y, z, 0)
(w = 0).
-
Scaling is free:
(\lambda x, \lambda y, \lambda z, \lambda w) \equiv (x, y, z, w)
for any \lambda \neq 0.
-
The Cartesian point is recovered by the perspective divide
(x, y, z, w) \mapsto (x/w,\, y/w,\, z/w).
-
Translation becomes a matrix: put \vec{t} in
the last column. It moves points (w = 1) and leaves directions
(w = 0) untouched.
So far we have kept w = 1. The real magic happens when a matrix
sets w \neq 1. A perspective camera multiplies by a matrix whose
bottom row is not (0,0,0,1) but something like
(0, 0, -1, 0), so the output w ends up
equal to (minus) the depth z:
(x,\, y,\, z,\, 1) \;\longmapsto\; (x,\, y,\, z',\, -z) \;\longmapsto\; \left( -\frac{x}{z},\; -\frac{y}{z},\; \dots \right).
That final divide-by-w divides screen position by depth — so an
object twice as far away appears half as big. The perspective divide is
literally why distant things look small, and railway tracks meet at the horizon. The whole
of it is one extra coordinate, carried along until the very last step.