The Perspective Projection Matrix
We have the law of perspective —
project through the eye,
and a point shrinks by n/z. We have a pipeline that does everything by
multiplying 4×4 matrices.
Now we must reconcile them, and there is a problem staring us down: the perspective rule
divides by z, and a matrix multiply can only add and scale —
it cannot divide one coordinate by another. This page is the trick that squares that circle, and
it is the centrepiece of the whole stage.
Stashing the depth in w
Step 1 — state what we want. After projection the screen coordinate should be the
frustum-scaled x divided by depth. Folding the frustum bounds
(\text{fov}, a, n, f) into the
scale-and-translate
constants, write the near-plane scales s_x = \dfrac{1}{a\,\tan(\text{fov}/2)}
and s_y = \dfrac{1}{\tan(\text{fov}/2)}. We want
x_{\text{screen}} = \frac{s_x\,x}{z}, \qquad y_{\text{screen}} = \frac{s_y\,y}{z}.
Step 2 — face the obstacle. A matrix row computes a linear combination
of the inputs, a x + b y + c z + d w. There is no way for one row to put
a z in the denominator. Multiplication alone can't divide.
Step 3 — the trick: borrow the divide that already exists. The pipeline performs
one division for free, after every projection: the perspective divide, where the
homogeneous output (x, y, z, w) is collapsed to a 3-vector by dividing
through by w. So we don't compute x/z
ourselves — we arrange for the matrix to emit w = z, and let the
downstream divide do the work.
Step 4 — set the bottom row to copy depth into w.
The output w_{\text{clip}} is the bottom row dotted with the input. To
make w_{\text{clip}} = z (using the convention that the camera looks
down -z, so visible depths are negative and we want a positive
w), the bottom row is
\text{(bottom row)} = \begin{bmatrix} 0 & 0 & -1 & 0 \end{bmatrix}, \qquad w_{\text{clip}} = -z.
This is the single most important row in real-time graphics: it does nothing to
x, y, z, and quietly copies the depth into the output's
w.
Building the matrix entry by entry
Step 5 — the x and y rows.
We want x_{\text{clip}} = s_x x (so that after the divide by
w = -z we get the 1/z shrink). So the top row
is just s_x on the diagonal, and likewise s_y
for y:
x_{\text{clip}} = s_x\,x, \qquad y_{\text{clip}} = s_y\,y.
Step 6 — the depth row must remap, non-linearly. The third row produces
z_{\text{clip}}, and after the divide we need
z_{\text{ndc}} = z_{\text{clip}}/w \in [-1, 1] as
z runs over [n, f]. Crucially the divide by
w = -z means we cannot remap depth linearly in
z — to come out linear after dividing by z,
the row must produce a term in z and a constant term. Write the
third row as \begin{bmatrix} 0 & 0 & A & B \end{bmatrix}, so
z_{\text{clip}} = A z + B and
z_{\text{ndc}} = \frac{A z + B}{-z}.
Demanding z_{\text{ndc}} = -1 at z = -n and
z_{\text{ndc}} = +1 at z = -f and solving the
two equations gives
A = -\frac{f + n}{f - n}, \qquad B = -\frac{2 f n}{f - n}.
Because B \neq 0, the mapping z \mapsto z_{\text{ndc}}
is a non-linear 1/z curve — most NDC depth precision
bunches up near the camera (the origin of "z-fighting" far away).
Step 7 — assemble the full matrix. Stacking the four rows:
P = \begin{bmatrix} s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & -\dfrac{f+n}{f-n} & -\dfrac{2fn}{f-n} \\ 0 & 0 & -1 & 0 \end{bmatrix}.
Step 8 — send a vertex through. Take a camera-space point
(x, y, z, 1) and multiply:
P\begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix} = \begin{bmatrix} s_x\,x \\ s_y\,y \\ A z + B \\ -z \end{bmatrix} = \begin{bmatrix} x_{\text{clip}} \\ y_{\text{clip}} \\ z_{\text{clip}} \\ w_{\text{clip}} \end{bmatrix}.
Look at the last component: w_{\text{clip}} = -z. The matrix did not
shrink anything — it merely positioned the depth in w, ready
for the divide that comes next.
Perspective projection is a single 4\times 4 matrix
P applied in camera space.
-
Its entries are built from the frustum (\text{fov}, a, n, f):
s_x = \tfrac{1}{a\tan(\text{fov}/2)},
s_y = \tfrac{1}{\tan(\text{fov}/2)}, and the depth row
A, B.
-
The bottom row [0\;0\;-1\;0] stashes depth into the
output w: w_{\text{clip}} = -z.
-
The matrix does not divide — the
1/z shrink happens later, in the perspective divide
by w.
-
The near/far depth remap z \mapsto z_{\text{ndc}} is
non-linear (a 1/z curve), concentrating precision
near the camera.
The most common misconception is that the projection matrix "makes far things small". It does
not. Run a vertex through P and the x output
is s_x x — no z in sight, no shrink. All
P achieves is to copy depth into w and remap
z. The perspective only appears in the next step.
That next step — dividing (x_{\text{clip}}, y_{\text{clip}}, z_{\text{clip}})
by w_{\text{clip}} = -z — is the
perspective divide,
and it is where s_x x finally becomes
s_x x / (-z) and distant geometry collapses inward. The matrix loads
the gun; the divide pulls the trigger. Splitting the job this way is what lets clipping happen in
the clean, linear clip space before the divide, where the frustum is a
simple cube — a debt the GPU happily pays.
A vertex goes in, w = -z comes out
Drag a camera-space point around: height sets its x,
depth sets its z. The panel shows the matrix
P acting on (x, y, z, 1) and the resulting
clip-space 4-vector. Watch the last component track -z exactly — the
matrix has parked the depth in w. The faint dot shows where the point
would land after the later divide by w (the
x_{\text{clip}}/w the matrix itself never computes), so you can see the
shrink the divide will produce.