Projection & the Camera

Stand between the rails of a long straight railway and look down the track. The two rails are exactly parallel — they never meet — yet your eye swears they rush together to a single point on the horizon. That everyday illusion is perspective, and it is the last step of the 3-D graphics pipeline: after all the 3-D transforms have arranged the world in front of a virtual camera, projection flattens that 3-D world onto the flat 2-D image you actually see on screen.

Put the camera (the eye) at the origin, looking down the z-axis into the scene, so z is depth — how far away a thing is. Place a flat image plane at distance f in front of the eye (the "focal length"). A point in the world at (x, y, z) is projected by drawing a straight ray from the eye through the point and seeing where that ray pierces the image plane. Similar triangles hand us the answer immediately.

x' = f\,\frac{x}{z}, \qquad y' = f\,\frac{y}{z}.

Divide by depth — and things shrink

Everything interesting lives in that division by z. Two objects the same real size but at different depths project to different image sizes: the far one is divided by a bigger z, so it comes out smaller. Double an object's distance and it appears exactly half as tall — which is precisely why the far rail of the track looks so close to the near one, and why a friend walking away seems to dwindle. This step is called the perspective divide, and it is the single operation that turns flat, lifeless parallel lines into a scene with real depth.

Notice this is not a linear operation — you cannot capture "divide by z" with a plain matrix multiply, because the output depends on z in the denominator. This is the deep reason graphics carries the homogeneous w-coordinate: the projection matrix cunningly loads the depth z into w, and the hardware then divides x, y by w afterwards. The divide by w is the perspective divide.

The view frustum: what the camera can see

A real camera does not see everything — it sees a truncated pyramid of space called the view frustum. Its apex is the eye; its sloping sides are the edges of your field of view; and it is capped by two planes perpendicular to the depth axis: a near plane (nothing closer than this is drawn) and a far plane (nothing beyond it is drawn). Anything outside this box-with-a-pointy-end is clipped — thrown away before it ever costs a pixel. Everything inside gets projected onto the image plane.

A wider frustum (a larger field-of-view angle) crams more of the world into the same image, so each object appears smaller — a wide-angle lens. A narrow frustum acts like a telephoto lens, magnifying a small slice of the scene. The frustum, in other words, is the camera's lens, expressed as pure geometry.

Orthographic: the other kind of projection

Sometimes you want the opposite of perspective. In an orthographic projection the rays do not converge to the eye — they run perfectly parallel, straight onto the image plane, and there is no divide by z. Depth is simply dropped: x' = x, y' = y, regardless of how far away the point is.

This means an object never changes size with distance — two identical cubes, one near and one far, project to identical squares. That is useless for a natural-looking game but perfect for engineering drawings, CAD, floor plans, and the "2.5-D" look of many strategy games, where you need a far wall to measure the same as a near wall. Perspective looks real; orthographic preserves measurements. Every 3-D tool lets you flip between the two.

Slide it into the distance

This is a side view of the camera. The eye sits at the origin on the left; the vertical line is the image plane at focal distance f; the two faint sloping lines are the edges of the view frustum. The tall marker is an object of fixed height, and the ray runs from the eye through its top to the image plane. Push the object into the distance and watch its projected height x' = f\,x/z shrink as z grows — the perspective divide, live.

Worked example: project two points

Take a focal length f = 4. A point sits at height x = 6 at depth z = 2 (close to the camera). Its image-plane coordinate is:

x' = f\,\frac{x}{z} = 4\cdot\frac{6}{2} = 12.

Now move an identical point (still x = 6) three times as far away, to z = 6:

x' = 4\cdot\frac{6}{6} = 4.

Same real height, but tripling the depth divided the on-screen height by three, from 12 down to 4. Under an orthographic projection both points would land at x' = 6 — no shrinking at all. That contrast, in two lines of arithmetic, is the whole difference between a photograph and a blueprint.

The formula x' = f\,x/z has a landmine in it: if an object drifts to z = 0 — right onto the camera's eye — you divide by zero, and the projected coordinate blows up to infinity. Points with z slightly behind the eye (negative z) are even nastier: they project to the wrong side and produce garbage triangles smeared across the screen.

This is exactly why the frustum has a near plane at some small positive depth. Its job is not artistic — it is to guarantee z is never zero or negative by clipping away anything too close before the divide ever happens. If you have ever pushed a game camera inside a wall and watched the geometry explode into flickering shards, you were watching the near-plane clip fight a losing battle. Never let z reach the eye.

It feels the same — both make your subject bigger — but geometrically they are completely different, and photographers exploit the difference constantly. Zooming narrows the frustum (a longer focal length f): it magnifies everything uniformly but leaves the relative depths untouched, so the background stays as far behind the subject as it was. Walking closer changes the object's z: it makes near things grow much faster than far things, dramatically stretching the sense of depth.

The famous "dolly zoom" in thriller films — where the character stays the same size but the corridor behind them seems to lunge or recede sickeningly — is made by walking the camera in while zooming out (or vice-versa) so the subject's projected size is held constant while the background's perspective divide changes underneath it. Confusing "field of view" with "camera distance" is one of the most common beginner mistakes in setting up a 3-D scene, and the dolly zoom is the proof that they are truly two different knobs.