From the cube to the rectangle, line by line
Step 1 — name the two rectangles. The source is the NDC square
[-1, 1] \times [-1, 1]. The target is the framebuffer rectangle
[0, W] \times [0, H], where W and
H are the width and height in pixels.
Step 2 — shift the range from [-1, 1] to
[0, 2]. Add one:
x_n + 1 \;\in\; [0, 2].
Step 3 — squash [0, 2] down to
[0, 1]. Halve it:
\frac{x_n + 1}{2} \;\in\; [0, 1].
Step 4 — stretch [0, 1] across the screen width. Multiply
by W. That is the horizontal pixel coordinate:
x_{\text{screen}} = \frac{x_n + 1}{2}\, W.
Step 5 — do y the same way, but flip it. Here is the one
trap. In NDC, y points up; on a screen, pixel rows are counted
downward from the top-left corner. So the y map must reverse
direction — use 1 - y_n in place of y_n + 1:
y_{\text{screen}} = \frac{1 - y_n}{2}\, H.
Check it: the top of NDC, y_n = +1, gives
y_{\text{screen}} = 0 (the top row), and the bottom,
y_n = -1, gives y_{\text{screen}} = H (the bottom
row). The picture is the right way up.
Step 6 — carry depth along too. The third NDC coordinate
z_n \in [-1, 1] is remapped to [0, 1] and stored
for the depth buffer — same affine idea, no flip:
z_{\text{depth}} = \frac{z_n + 1}{2} \;\in\; [0, 1].
Step 7 — recognise the whole thing. Every line above is just a
scale plus a translate per axis. No rotation, no perspective — that work is already
done. The viewport transform is the gentle final handshake that delivers a vertex to a pixel.
The viewport transform maps normalised device coordinates onto the framebuffer:
-
It sends the NDC square [-1, 1]^2 to the pixel rectangle
[0, W] \times [0, H].
-
It is a pure scale-plus-translate per axis:
x_{\text{screen}} = \tfrac{x_n + 1}{2} W.
-
The y axis is flipped, because screens count rows
downward: y_{\text{screen}} = \tfrac{1 - y_n}{2} H.
-
Depth is remapped to [0, 1] for the z-buffer,
z_{\text{depth}} = \tfrac{z_n + 1}{2}.
A pixel is not a point; it is a little square with area. A framebuffer of width
W has columns indexed 0, 1, \dots, W-1, and the
centre of column i sits at the half-integer coordinate
i + 0.5. Rasterisers sample at those centres, so a point that lands
exactly on an integer boundary is ambiguous between two pixels — the classic off-by-one that leaves a
one-pixel seam between two abutting quads.
That is why the convention is to treat (0.5, 0.5) as the centre of the
top-left pixel, and why a full-screen blit that maps NDC [-1, 1] onto
[0, W] hits pixel edges, not centres. Get the half-pixel offset
wrong and your beautiful render is mysteriously blurry: every texel is sampled halfway between two
source pixels.