The Viewport Transform

After the perspective divide a vertex sits in a clean abstract cube: normalised device coordinates, with x_n, y_n \in [-1, 1]. Lovely — but nobody owns a monitor that runs from -1 to 1. Real screens are measured in pixels, from the corner. The viewport transform is the short, dull, essential map that turns the cube into your actual rectangle of pixels.

From the cube to the rectangle, line by line

Step 1 — name the two rectangles. The source is the NDC square [-1, 1] \times [-1, 1]. The target is the framebuffer rectangle [0, W] \times [0, H], where W and H are the width and height in pixels.

Step 2 — shift the range from [-1, 1] to [0, 2]. Add one:

x_n + 1 \;\in\; [0, 2].

Step 3 — squash [0, 2] down to [0, 1]. Halve it:

\frac{x_n + 1}{2} \;\in\; [0, 1].

Step 4 — stretch [0, 1] across the screen width. Multiply by W. That is the horizontal pixel coordinate:

x_{\text{screen}} = \frac{x_n + 1}{2}\, W.

Step 5 — do y the same way, but flip it. Here is the one trap. In NDC, y points up; on a screen, pixel rows are counted downward from the top-left corner. So the y map must reverse direction — use 1 - y_n in place of y_n + 1:

y_{\text{screen}} = \frac{1 - y_n}{2}\, H.

Check it: the top of NDC, y_n = +1, gives y_{\text{screen}} = 0 (the top row), and the bottom, y_n = -1, gives y_{\text{screen}} = H (the bottom row). The picture is the right way up.

Step 6 — carry depth along too. The third NDC coordinate z_n \in [-1, 1] is remapped to [0, 1] and stored for the depth buffer — same affine idea, no flip:

z_{\text{depth}} = \frac{z_n + 1}{2} \;\in\; [0, 1].

Step 7 — recognise the whole thing. Every line above is just a scale plus a translate per axis. No rotation, no perspective — that work is already done. The viewport transform is the gentle final handshake that delivers a vertex to a pixel.

The viewport transform maps normalised device coordinates onto the framebuffer:

It sends the NDC square [-1, 1]^2 to the pixel rectangle [0, W] \times [0, H].
It is a pure scale-plus-translate per axis: x_{\text{screen}} = \tfrac{x_n + 1}{2} W.
The y axis is flipped, because screens count rows downward: y_{\text{screen}} = \tfrac{1 - y_n}{2} H.
Depth is remapped to [0, 1] for the z-buffer, z_{\text{depth}} = \tfrac{z_n + 1}{2}.

A pixel is not a point; it is a little square with area. A framebuffer of width W has columns indexed 0, 1, \dots, W-1, and the centre of column i sits at the half-integer coordinate i + 0.5. Rasterisers sample at those centres, so a point that lands exactly on an integer boundary is ambiguous between two pixels — the classic off-by-one that leaves a one-pixel seam between two abutting quads.

That is why the convention is to treat (0.5, 0.5) as the centre of the top-left pixel, and why a full-screen blit that maps NDC [-1, 1] onto [0, W] hits pixel edges, not centres. Get the half-pixel offset wrong and your beautiful render is mysteriously blurry: every texel is sampled halfway between two source pixels.