One vertex, end to end, line by line
Step 1 — list the spaces in order. A vertex is reborn in a new coordinate system at
each stage:
\text{object} \xrightarrow{\;M\;} \text{world} \xrightarrow{\;V\;} \text{camera} \xrightarrow{\;P\;} \text{clip} \xrightarrow{\;\div w\;} \text{NDC} \xrightarrow{\;\text{viewport}\;} \text{screen}.
Step 2 — place it in the world with the model matrix. The
model matrix
M carries the vertex from the mesh's private object space into the shared
world:
\vec{x}_{\text{world}} = M\,\vec{x}_{\text{object}}.
Step 3 — look at it through the camera. The
view matrix
V re-expresses the world from the camera's point of view:
\vec{x}_{\text{camera}} = V\,\vec{x}_{\text{world}} = V M\,\vec{x}_{\text{object}}.
Step 4 — project to clip space. The
projection matrix
P sets up the perspective by writing the depth into w:
\vec{x}_{\text{clip}} = P\,\vec{x}_{\text{camera}} = P V M\,\vec{x}_{\text{object}}.
Step 5 — collapse the three into one matrix. Matrix multiplication is associative, so
bake the trio together once and apply a single matrix per vertex:
\mathrm{MVP} = P \cdot V \cdot M, \qquad \vec{x}_{\text{clip}} = \mathrm{MVP}\,\vec{x}_{\text{object}}.
Step 6 — read the order off the product. With column vectors the
rightmost matrix acts first. Reading
\mathrm{MVP}\,\vec{x} = P(V(M\vec{x})) right-to-left, the vertex is
modelled, then viewed, then projected — exactly the order of Steps 2–4, even though
P is written on the left.
Step 7 — finish with the fixed-function steps. The programmable matrices stop at clip
space. The hardware then does the
perspective divide
and the
viewport transform:
\vec{x}_{\text{NDC}} = \frac{\vec{x}_{\text{clip}}}{w}, \qquad \vec{x}_{\text{screen}} = \text{viewport}\big(\vec{x}_{\text{NDC}}\big).
Step 8 — trace a concrete vertex. A teapot's spout vertex lives at
(0, 1, 0) in object space; M sets it down in the
scene; V swings it into the camera's frame; P
loads its depth into w; the divide shrinks it for distance; the viewport
plants it on, say, pixel (812, 339). One vertex, six coordinate systems, a
couple of microseconds.
Every vertex follows the same route from mesh to pixel:
-
The spaces, in order: object
\to world (Model) \to camera (View)
\to clip (Projection) \to NDC (divide)
\to screen (Viewport).
-
The three matrices combine into one
\mathrm{MVP} = P \cdot V \cdot M.
-
It is applied right-to-left with column vectors:
\mathrm{MVP}\,\vec{x} = P(V(M\vec{x})), so M
acts first and P last.
-
One matrix per object, run on every vertex; the divide and viewport are
fixed-function and finish the job.
The three matrices update on three different clocks, which is exactly why engines keep them
separate until the last moment:
-
Model M — per object. Each mesh has its own placement;
a thousand objects means a thousand model matrices, each rebuilt when that object moves.
-
View V — per camera. One matrix for the whole scene's
viewpoint, rebuilt only when the camera moves.
-
Projection P — per frame (rarely). It changes only when
the field of view or aspect ratio does — a window resize, a zoom.
Engines therefore upload V and P once and loop
over objects, multiplying in each object's M to form
\mathrm{MVP} = P V M. Getting that
multiplication order
wrong — writing M V P — is the single most common bug in a fresh
renderer, and the symptom (a scene that vanishes or smears) is gloriously unhelpful.