Perspective and the Frustum

Orthographic projection kept every size honest — and looked flat as a blueprint for it. The real world isn't flat: hold your thumb at arm's length and it shrinks; bring it close and it eclipses the moon. Distant things look smaller. Perspective projection captures this because real cameras and eyes gather light through a single point, so rays converge rather than running parallel. That convergence is the whole game.

Why distance shrinks: similar triangles

Put the eye at the origin, looking down the depth axis z. A point in camera space sits at height x and depth z. Place a flat near plane (the "film") at depth n. The projected point x' is where the ray from the eye through the point pierces that plane.

Step 1 — spot the two similar triangles. The ray from the eye to the point and the ray from the eye to its image lie on the same line. So the big triangle (eye, depth z, height x) and the small one (eye, depth n, height x') are similar — same shape, different scale. Their legs are in proportion:

\frac{x'}{n} = \frac{x}{z}.

Step 2 — solve for the projected height. Multiply both sides by n:

x' = \frac{n\,x}{z}.

Step 3 — read the shrink. The image height is the true height x scaled by n/z. Push the point twice as far back (double z) and x' halves: farther is smaller, and it falls off as 1/z. The vertical coordinate behaves identically:

y' = \frac{n\,y}{z}.

That lone division by z — absent from orthographic projection — is what makes a corridor recede to a vanishing point. (The straight-line distance from the eye that makes "twice as far" precise is the Euclidean length, built on the Pythagorean theorem; but for the on-axis shrink it is the depth z that governs the scale.)

The frustum: the visible slab of space

Rays converging at the eye carve out a pyramid; clip off its tip (everything closer than the near plane) and its base (everything past the far plane) and you get a truncated pyramid — the frustum. It is exactly the region the camera can see, and three numbers pin it down.

Step 4 — the vertical field of view sets the opening angle. The fov is the total vertical angle the camera takes in. Half of it, together with the near distance, fixes how tall the near plane is — the top edge sits at

t = n \cdot \tan\!\left(\frac{\text{fov}}{2}\right), \qquad b = -t.

A wide fov flings the frustum walls outward (you see more, but with stronger distortion); a narrow fov squeezes them in (a zoomed, telephoto look).

Step 5 — the aspect ratio sets the width. Screens are wider than they are tall. The aspect a = \text{width}/\text{height} stretches the horizontal bounds to match:

r = a \cdot t, \qquad l = -r.

Step 6 — the near and far planes cap the depth. The frustum runs from depth n to depth f. Anything nearer than n, farther than f, or outside the four slanted side walls is clipped — discarded before it ever reaches a pixel. So the frustum is fully described by (\text{fov}, a, n, f).

Perspective projection sends rays through a single point at the eye.

Foreshortening is the 1/z law made visible: a hallway's far end is drawn small, its near end large, and the floorboards rush together toward a vanishing point. Your brain reads that convergence as depth — it is the single strongest cue we have that a flat image depicts a 3-D world.

Crank the fov up and the effect turns grotesque. With the opening angle wide, objects near the edges sit at steep angles to the axis, where the projection stretches them dramatically — the classic "fish-eye" bulge, beloved of skateboarding videos and dreaded by portrait photographers. A narrow fov does the opposite: it flattens depth (a telephoto "compression", making distant mountains loom). Choosing the fov is choosing how aggressively the 1/z shrink bites — a 60°–90° fov is the usual comfortable middle for a game.

Project through the eye

Side-on view: the eye is at the left, the near plane is the vertical line just in front of it, and the slanted lines are the frustum's top and bottom edges. Two equal objects (the dots) sit at different depths; the rays from the eye through them strike the near plane at their projected heights x' = n\,x/z — notice the far one lands closer to the axis, so it renders smaller. Drag the fov slider to swing the frustum walls wider (fish-eye) or narrower (telephoto), and watch the projected positions shift.