Clip Space and the Perspective Divide

The perspective projection matrix does something sneaky: it does not actually make far things small. Multiply a camera-space vertex by it and you get a 4-vector that still looks innocent — (x_c, y_c, z_c, w_c). The shrinking happens one step later, in a single division that the GPU performs for free. This page follows a vertex through that step, where perspective is finally born.

From the matrix to the pixel, line by line

Step 1 — leave the matrix in clip space. After the projection matrix P, a camera-space vertex (x, y, z, 1) becomes a 4-vector we call clip space:

\begin{pmatrix} x_c \\ y_c \\ z_c \\ w_c \end{pmatrix} = P \begin{pmatrix} x \\ y \\ z \\ 1 \end{pmatrix}.

Step 2 — notice what w_c is. The clever bottom row of a perspective P copies minus the view-space depth into the fourth coordinate. With a camera looking down its own -z axis, points in front have z < 0, so

w_c = -z \;>\; 0.

The further away the vertex, the larger w_c. Hold that thought — it is the whole trick.

Step 3 — clip the triangle against the cube, now. Before any division, the GPU throws away (or trims) geometry that falls outside the canonical view volume. In clip space the test is a cheap pair of inequalities on each coordinate,

-w_c \le x_c \le w_c, \qquad -w_c \le y_c \le w_c, \qquad -w_c \le z_c \le w_c.

No square roots, no division — just comparisons against \pm w_c. That is exactly why this stage is named clip space, and why clipping happens here.

Step 4 — perform the perspective divide. Divide every component by w_c. This lands the vertex in normalised device coordinates (NDC):

\begin{pmatrix} x_n \\ y_n \\ z_n \end{pmatrix} = \begin{pmatrix} x_c / w_c \\ y_c / w_c \\ z_c / w_c \end{pmatrix}.

Step 5 — read off the consequence. Because everything in view satisfied the Step 3 inequalities, dividing by w_c squeezes the whole visible world into the tidy cube

-1 \le x_n \le 1, \qquad -1 \le y_n \le 1, \qquad -1 \le z_n \le 1.

Step 6 — see why distant things shrink. Two vertices at the same screen offset x_c but different depths get divided by different w_c = -z. The far one (big w_c) is divided by more, so its x_n is pulled closer to the centre:

x_n = \frac{x_c}{w_c} = \frac{x_c}{-z}.

Double the distance, halve the on-screen size. That single division by w_c \approx z — not the matrix — is what makes parallel rails appear to meet at the horizon. Perspective is a fraction.

After the projection matrix, every vertex passes through two fixed-function steps:

It is tempting to imagine dividing first and clipping the neat cube afterward. That order is a disaster. A vertex behind the camera has z > 0, so w_c = -z < 0; one exactly on the camera plane gives w_c = 0. Dividing by a negative w_c flips a point to the opposite side of the screen, and dividing by 0 is undefined — a triangle straddling the camera would tear into nonsense.

Clipping in clip space sidesteps both hazards. The inequality -w_c \le z_c \le w_c implicitly requires w_c \ge 0, so anything with w_c \le 0 is trimmed away before the divide ever sees it. The near plane of the view frustum is precisely the guard rail that keeps w_c safely positive. That is the deep reason the pipeline carries a fourth coordinate all the way to the very last moment instead of dividing the instant the matrix is done.