Open any GPU's vertex shader and you will find it doing arithmetic in four
dimensions, even for a 3-D world. That fourth number is the
homogeneous coordinate
w, and it is the single trick that lets one matrix type —
a 4\times 4 — rotate, scale and translate. Once you see why,
the entire transform pipeline collapses into "multiply by a matrix", over and over.
Points and directions, told apart by one number
Step 1 — append w to every 3-vector. A 3-D
quantity (x, y, z) becomes the 4-vector
(x, y, z, w). The value of w records
what kind of thing it is:
\text{point} = (x, y, z, 1), \qquad \text{direction} = (x, y, z, 0).
A point is a location, so it carries w = 1. A
direction ("two units east", a velocity, a surface normal) is the same arrow
wherever you stand — it has no location — so it carries w = 0.
Step 2 — build the 4\times 4 with translation in the last
column. The linear part (rotation and/or scale) M goes in
the top-left 3\times 3 block, the translation
\vec{t} in the last column, and the bottom row is
(0, 0, 0, 1):
T = \left[\begin{array}{ccc|c} & & & t_x \\ & M & & t_y \\ & & & t_z \\ \hline 0 & 0 & 0 & 1 \end{array}\right].
Why the same matrix moves points but not directions
Step 3 — apply it to a point (w = 1). Take pure
translation (M = I) to isolate the effect, and multiply row by row:
\begin{bmatrix} 1 & 0 & 0 & t_x \\ 0 & 1 & 0 & t_y \\ 0 & 0 & 1 & t_z \\ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix} = \begin{bmatrix} x + t_x \\ y + t_y \\ z + t_z \\ 1 \end{bmatrix}.
The translation column is multiplied by w = 1, so it
adds in full: the point slides by \vec{t}, and
w stays 1 — still a point.
Step 4 — feed the same matrix a direction
(w = 0).
\begin{bmatrix} 1 & 0 & 0 & t_x \\ 0 & 1 & 0 & t_y \\ 0 & 0 & 1 & t_z \\ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} d_x \\ d_y \\ d_z \\ 0 \end{bmatrix} = \begin{bmatrix} d_x \\ d_y \\ d_z \\ 0 \end{bmatrix}.
Step 5 — read off the result. Now the translation column is multiplied by
w = 0, so it contributes nothing — the direction
comes out unchanged. That is exactly right: you can move a location, but "east" is east
no matter where you stand. The single number w is what decides
whether the translation column gets to act.
Step 6 — collect the payoff. Rotation and scale already live in that top-left
block, and now translation lives in the last column of the same matrix shape. So one
4\times 4 can express R,
S and T, and chaining transforms
is just multiplying 4\times 4 matrices. The whole pipeline becomes a
stack of 4\times 4 multiplies — which is precisely the operation GPUs
are built to do by the million.
Embed 3-D space in 4-D by appending a coordinate w. Then:
-
Every quantity is a 4-vector (x, y, z, w).
-
A point has w = 1 (it has a location); a
direction has w = 0 (it does not).
-
One 4\times 4 matrix expresses rotation, scale and
translation together — translation sits in the last column.
-
Translation only touches points: the last column is multiplied by
w, so it adds for w = 1 and vanishes
for w = 0.
The w = 0 rule is not pedantry — get it wrong and your lighting
breaks. A surface normal is a direction: it says which way a face points, and
translating the whole object must not drag the normal off into the distance. Tag it
(n_x, n_y, n_z, 0) and the model matrix's translation column is
harmlessly zeroed — the normal rotates with the object but never slides.
(There is a subtlety waiting downstream: under a non-uniform scale a normal must be
transformed by the inverse-transpose of M, not
M itself, or it stops being perpendicular to the surface — a story
for transforming normals. But the first rule, before any of that, is simply: a direction
carries w = 0.)