The Perspective Projection Matrix

We have the law of perspective — project through the eye, and a point shrinks by n/z. We have a pipeline that does everything by multiplying 4×4 matrices. Now we must reconcile them, and there is a problem staring us down: the perspective rule divides by z, and a matrix multiply can only add and scale — it cannot divide one coordinate by another. This page is the trick that squares that circle, and it is the centrepiece of the whole stage.

Stashing the depth in w

Step 1 — state what we want. After projection the screen coordinate should be the frustum-scaled x divided by depth. Folding the frustum bounds (\text{fov}, a, n, f) into the scale-and-translate constants, write the near-plane scales s_x = \dfrac{1}{a\,\tan(\text{fov}/2)} and s_y = \dfrac{1}{\tan(\text{fov}/2)}. We want

x_{\text{screen}} = \frac{s_x\,x}{z}, \qquad y_{\text{screen}} = \frac{s_y\,y}{z}.

Step 2 — face the obstacle. A matrix row computes a linear combination of the inputs, a x + b y + c z + d w. There is no way for one row to put a z in the denominator. Multiplication alone can't divide.

Step 3 — the trick: borrow the divide that already exists. The pipeline performs one division for free, after every projection: the perspective divide, where the homogeneous output (x, y, z, w) is collapsed to a 3-vector by dividing through by w. So we don't compute x/z ourselves — we arrange for the matrix to emit w = z, and let the downstream divide do the work.

Step 4 — set the bottom row to copy depth into w. The output w_{\text{clip}} is the bottom row dotted with the input. To make w_{\text{clip}} = z (using the convention that the camera looks down -z, so visible depths are negative and we want a positive w), the bottom row is

\text{(bottom row)} = \begin{bmatrix} 0 & 0 & -1 & 0 \end{bmatrix}, \qquad w_{\text{clip}} = -z.

This is the single most important row in real-time graphics: it does nothing to x, y, z, and quietly copies the depth into the output's w.

Building the matrix entry by entry

Step 5 — the x and y rows. We want x_{\text{clip}} = s_x x (so that after the divide by w = -z we get the 1/z shrink). So the top row is just s_x on the diagonal, and likewise s_y for y:

x_{\text{clip}} = s_x\,x, \qquad y_{\text{clip}} = s_y\,y.

Step 6 — the depth row must remap, non-linearly. The third row produces z_{\text{clip}}, and after the divide we need z_{\text{ndc}} = z_{\text{clip}}/w \in [-1, 1] as z runs over [n, f]. Crucially the divide by w = -z means we cannot remap depth linearly in z — to come out linear after dividing by z, the row must produce a term in z and a constant term. Write the third row as \begin{bmatrix} 0 & 0 & A & B \end{bmatrix}, so z_{\text{clip}} = A z + B and

z_{\text{ndc}} = \frac{A z + B}{-z}.

Demanding z_{\text{ndc}} = -1 at z = -n and z_{\text{ndc}} = +1 at z = -f and solving the two equations gives

A = -\frac{f + n}{f - n}, \qquad B = -\frac{2 f n}{f - n}.

Because B \neq 0, the mapping z \mapsto z_{\text{ndc}} is a non-linear 1/z curve — most NDC depth precision bunches up near the camera (the origin of "z-fighting" far away).

Step 7 — assemble the full matrix. Stacking the four rows:

P = \begin{bmatrix} s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & -\dfrac{f+n}{f-n} & -\dfrac{2fn}{f-n} \\ 0 & 0 & -1 & 0 \end{bmatrix}.

Step 8 — send a vertex through. Take a camera-space point (x, y, z, 1) and multiply:

P\begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix} = \begin{bmatrix} s_x\,x \\ s_y\,y \\ A z + B \\ -z \end{bmatrix} = \begin{bmatrix} x_{\text{clip}} \\ y_{\text{clip}} \\ z_{\text{clip}} \\ w_{\text{clip}} \end{bmatrix}.

Look at the last component: w_{\text{clip}} = -z. The matrix did not shrink anything — it merely positioned the depth in w, ready for the divide that comes next.

Perspective projection is a single 4\times 4 matrix P applied in camera space.

Its entries are built from the frustum (\text{fov}, a, n, f): s_x = \tfrac{1}{a\tan(\text{fov}/2)}, s_y = \tfrac{1}{\tan(\text{fov}/2)}, and the depth row A, B.
The bottom row [0\;0\;-1\;0] stashes depth into the output w: w_{\text{clip}} = -z.
The matrix does not divide — the 1/z shrink happens later, in the perspective divide by w.
The near/far depth remap z \mapsto z_{\text{ndc}} is non-linear (a 1/z curve), concentrating precision near the camera.

The most common misconception is that the projection matrix "makes far things small". It does not. Run a vertex through P and the x output is s_x x — no z in sight, no shrink. All P achieves is to copy depth into w and remap z. The perspective only appears in the next step.

That next step — dividing (x_{\text{clip}}, y_{\text{clip}}, z_{\text{clip}}) by w_{\text{clip}} = -z — is the perspective divide, and it is where s_x x finally becomes s_x x / (-z) and distant geometry collapses inward. The matrix loads the gun; the divide pulls the trigger. Splitting the job this way is what lets clipping happen in the clean, linear clip space before the divide, where the frustum is a simple cube — a debt the GPU happily pays.

A vertex goes in, w = -z comes out

Drag a camera-space point around: height sets its x, depth sets its z. The panel shows the matrix P acting on (x, y, z, 1) and the resulting clip-space 4-vector. Watch the last component track -z exactly — the matrix has parked the depth in w. The faint dot shows where the point would land after the later divide by w (the x_{\text{clip}}/w the matrix itself never computes), so you can see the shrink the divide will produce.