The Rendering Pipeline
Right now, sixty times a second, the machine in front of you is performing a small miracle: it
takes a 3-D model — nothing but a list of corner points floating in space — and
turns it into a flat grid of a few million coloured pixels, fast enough that a
spinning character or a swooping camera looks perfectly smooth. A modern 4K display holds roughly
8{,}000{,}000 pixels, and every single one of them has to be worked out
again for the next frame, and the next, and the next.
How does a bag of 3-D points become a 2-D picture? Not in one leap. It flows down an ordered
assembly line of stages called the rendering pipeline (or graphics
pipeline). Each stage does one job and hands its result to the next, exactly like a factory
conveyor belt: raw geometry goes in at one end, finished pixels come out the other. Understanding
this one idea — the order of the stages and what each one is for — is the map that makes
every other computer-graphics topic fall into place.
The stages, in order
The classic pipeline has a fixed shape. Data always flows the same way — you can never rasterize
before you've projected, and you can never shade a pixel that doesn't exist yet:
\text{vertices} \to \text{model \& view} \to \text{projection} \to \text{clipping} \to \text{rasterization} \to \text{fragment shading} \to \text{framebuffer}
- Vertex data — the model arrives as a list of vertices (corner points), usually joined into triangles. This is the geometry, in the object's own local coordinates.
- Model & view transforms — matrices move each vertex out of local space, into the shared world, and then into camera (eye) space, as seen from the viewer. (This is exactly where rotation matrices earn their keep.)
- Projection — the 3-D scene is flattened onto a 2-D image plane, so far-away things look smaller. Depth is remembered separately for later.
- Clipping — anything outside the viewing frustum (behind the camera, or off-screen) is thrown away or trimmed, so no effort is wasted on what you can't see.
- Rasterization (scan conversion) — each triangle is turned into the set of pixel-sized fragments it covers. This is the jump from continuous geometry to a discrete grid.
- Fragment shading — every fragment gets a colour, from lighting, textures and material properties. A depth test keeps only the nearest fragment at each pixel.
- Framebuffer — the surviving colours are written into memory as the finished image, and that block of pixels is scanned out to the display.
Click through the assembly line
Here is the whole pipeline as a flow of stages. Step through it and watch the geometry travel from
raw vertices on the left round to finished pixels — the same journey every triangle in every frame
takes.
Worked example: following one corner through
Let's trace a single vertex — the tip of a triangle sitting at local coordinates
(1, 0, 0) — all the way down the belt.
- Vertex data: the corner enters as (1,0,0) in the model's own space.
- Model transform: the object is placed 5 units back and turned, so the corner moves to a new world position, say (4, 1, -5).
- View transform: re-expressed relative to the camera. The important number now is depth: this corner is 5 units in front of the eye.
- Projection: dividing the screen offsets by that depth, the corner lands at a 2-D image-plane position — nearer objects would move less, farther ones more. Its depth 5 is stored aside for the depth test.
- Clipping: the corner is inside the view, so it survives untouched.
- Rasterization: its 2-D position maps to a specific pixel, say column 640, row 360 — one fragment among the thousands the triangle covers.
- Fragment shading: that fragment is lit and textured, coming out (for instance) a warm orange.
- Framebuffer: if nothing nearer already occupies pixel (640, 360), the orange is written there — and it appears on screen.
One corner, eight steps, a fraction of a microsecond. Now multiply that by every vertex of every
triangle, and every pixel of every triangle, sixty times a second. The pipeline's whole reason for
existing is to make that torrent of work regular — the same fixed sequence for everything —
so hardware can run millions of these journeys in parallel.
-
“Rendering is just drawing.” It is not a single paint stroke. The
picture you see is the output of a long chain of geometric and lighting computations.
Most of the work — transforming, projecting, clipping, depth-testing — happens before
any colour is ever placed. Thinking of it as “drawing” hides everything that actually
makes 3-D graphics hard.
-
World space is not screen space. A vertex has several different sets of
coordinates as it travels: local (model) space, world space, camera space, and finally pixel
coordinates. Mixing them up — comparing a world position with a pixel position — is one of the
most common beginner bugs. Always ask which space am I in right now?
-
The pipeline runs per frame, not once. It is not a build step that happens when
the program starts. The entire pipeline executes afresh for every single frame —
which is exactly why a scene with more triangles, or a higher resolution, costs more to run at
60 frames per second.
The earliest 3-D hardware had a fixed-function pipeline: the stages were baked
into silicon and you could only tweak a few dials. Modern GPUs made two of the stages —
the vertex transform and the fragment shading — programmable. You write little
programs called shaders that run once per vertex and once per fragment, in massive
parallel. The order of the stages, though, is still exactly the flow on this page; what
changed is that you now get to write the code for two of the boxes rather than accept the
factory settings.
And the parallelism is staggering. Because every vertex is transformed independently, and every
fragment is shaded independently, a GPU can chew through thousands of them at once. That is the
deep reason graphics chips look nothing like a CPU: the pipeline is an embarrassingly
parallel assembly line, and the hardware is built to exploit it.