The Rendering Pipeline

Right now, sixty times a second, the machine in front of you is performing a small miracle: it takes a 3-D model — nothing but a list of corner points floating in space — and turns it into a flat grid of a few million coloured pixels, fast enough that a spinning character or a swooping camera looks perfectly smooth. A modern 4K display holds roughly 8{,}000{,}000 pixels, and every single one of them has to be worked out again for the next frame, and the next, and the next.

How does a bag of 3-D points become a 2-D picture? Not in one leap. It flows down an ordered assembly line of stages called the rendering pipeline (or graphics pipeline). Each stage does one job and hands its result to the next, exactly like a factory conveyor belt: raw geometry goes in at one end, finished pixels come out the other. Understanding this one idea — the order of the stages and what each one is for — is the map that makes every other computer-graphics topic fall into place.

The stages, in order

The classic pipeline has a fixed shape. Data always flows the same way — you can never rasterize before you've projected, and you can never shade a pixel that doesn't exist yet:

\text{vertices} \to \text{model \& view} \to \text{projection} \to \text{clipping} \to \text{rasterization} \to \text{fragment shading} \to \text{framebuffer}

Vertex data — the model arrives as a list of vertices (corner points), usually joined into triangles. This is the geometry, in the object's own local coordinates.
Model & view transforms — matrices move each vertex out of local space, into the shared world, and then into camera (eye) space, as seen from the viewer. (This is exactly where rotation matrices earn their keep.)
Projection — the 3-D scene is flattened onto a 2-D image plane, so far-away things look smaller. Depth is remembered separately for later.
Clipping — anything outside the viewing frustum (behind the camera, or off-screen) is thrown away or trimmed, so no effort is wasted on what you can't see.
Rasterization (scan conversion) — each triangle is turned into the set of pixel-sized fragments it covers. This is the jump from continuous geometry to a discrete grid.
Fragment shading — every fragment gets a colour, from lighting, textures and material properties. A depth test keeps only the nearest fragment at each pixel.
Framebuffer — the surviving colours are written into memory as the finished image, and that block of pixels is scanned out to the display.

Click through the assembly line

Here is the whole pipeline as a flow of stages. Step through it and watch the geometry travel from raw vertices on the left round to finished pixels — the same journey every triangle in every frame takes.

Worked example: following one corner through

Let's trace a single vertex — the tip of a triangle sitting at local coordinates (1, 0, 0) — all the way down the belt.

Vertex data: the corner enters as (1,0,0) in the model's own space.
Model transform: the object is placed 5 units back and turned, so the corner moves to a new world position, say (4, 1, -5).
View transform: re-expressed relative to the camera. The important number now is depth: this corner is 5 units in front of the eye.
Projection: dividing the screen offsets by that depth, the corner lands at a 2-D image-plane position — nearer objects would move less, farther ones more. Its depth 5 is stored aside for the depth test.
Clipping: the corner is inside the view, so it survives untouched.
Rasterization: its 2-D position maps to a specific pixel, say column 640, row 360 — one fragment among the thousands the triangle covers.
Fragment shading: that fragment is lit and textured, coming out (for instance) a warm orange.
Framebuffer: if nothing nearer already occupies pixel (640, 360), the orange is written there — and it appears on screen.

One corner, eight steps, a fraction of a microsecond. Now multiply that by every vertex of every triangle, and every pixel of every triangle, sixty times a second. The pipeline's whole reason for existing is to make that torrent of work regular — the same fixed sequence for everything — so hardware can run millions of these journeys in parallel.

“Rendering is just drawing.” It is not a single paint stroke. The picture you see is the output of a long chain of geometric and lighting computations. Most of the work — transforming, projecting, clipping, depth-testing — happens before any colour is ever placed. Thinking of it as “drawing” hides everything that actually makes 3-D graphics hard.
World space is not screen space. A vertex has several different sets of coordinates as it travels: local (model) space, world space, camera space, and finally pixel coordinates. Mixing them up — comparing a world position with a pixel position — is one of the most common beginner bugs. Always ask which space am I in right now?
The pipeline runs per frame, not once. It is not a build step that happens when the program starts. The entire pipeline executes afresh for every single frame — which is exactly why a scene with more triangles, or a higher resolution, costs more to run at 60 frames per second.

The earliest 3-D hardware had a fixed-function pipeline: the stages were baked into silicon and you could only tweak a few dials. Modern GPUs made two of the stages — the vertex transform and the fragment shading — programmable. You write little programs called shaders that run once per vertex and once per fragment, in massive parallel. The order of the stages, though, is still exactly the flow on this page; what changed is that you now get to write the code for two of the boxes rather than accept the factory settings.

And the parallelism is staggering. Because every vertex is transformed independently, and every fragment is shaded independently, a GPU can chew through thousands of them at once. That is the deep reason graphics chips look nothing like a CPU: the pipeline is an embarrassingly parallel assembly line, and the hardware is built to exploit it.