The MVP Pipeline

You have now met every stage on its own. Time to put a single vertex on the conveyor belt and watch it travel all the way from a modelling tool to a glowing pixel. The journey has a name every graphics programmer mutters in their sleep: the MVP pipeline — Model, View, Projection — followed by the two fixed-function steps that finish the job.

One vertex, end to end, line by line

Step 1 — list the spaces in order. A vertex is reborn in a new coordinate system at each stage:

\text{object} \xrightarrow{\;M\;} \text{world} \xrightarrow{\;V\;} \text{camera} \xrightarrow{\;P\;} \text{clip} \xrightarrow{\;\div w\;} \text{NDC} \xrightarrow{\;\text{viewport}\;} \text{screen}.

Step 2 — place it in the world with the model matrix. The model matrix M carries the vertex from the mesh's private object space into the shared world:

\vec{x}_{\text{world}} = M\,\vec{x}_{\text{object}}.

Step 3 — look at it through the camera. The view matrix V re-expresses the world from the camera's point of view:

\vec{x}_{\text{camera}} = V\,\vec{x}_{\text{world}} = V M\,\vec{x}_{\text{object}}.

Step 4 — project to clip space. The projection matrix P sets up the perspective by writing the depth into w:

\vec{x}_{\text{clip}} = P\,\vec{x}_{\text{camera}} = P V M\,\vec{x}_{\text{object}}.

Step 5 — collapse the three into one matrix. Matrix multiplication is associative, so bake the trio together once and apply a single matrix per vertex:

\mathrm{MVP} = P \cdot V \cdot M, \qquad \vec{x}_{\text{clip}} = \mathrm{MVP}\,\vec{x}_{\text{object}}.

Step 6 — read the order off the product. With column vectors the rightmost matrix acts first. Reading \mathrm{MVP}\,\vec{x} = P(V(M\vec{x})) right-to-left, the vertex is modelled, then viewed, then projected — exactly the order of Steps 2–4, even though P is written on the left.

Step 7 — finish with the fixed-function steps. The programmable matrices stop at clip space. The hardware then does the perspective divide and the viewport transform:

\vec{x}_{\text{NDC}} = \frac{\vec{x}_{\text{clip}}}{w}, \qquad \vec{x}_{\text{screen}} = \text{viewport}\big(\vec{x}_{\text{NDC}}\big).

Step 8 — trace a concrete vertex. A teapot's spout vertex lives at (0, 1, 0) in object space; M sets it down in the scene; V swings it into the camera's frame; P loads its depth into w; the divide shrinks it for distance; the viewport plants it on, say, pixel (812, 339). One vertex, six coordinate systems, a couple of microseconds.

Every vertex follows the same route from mesh to pixel:

The spaces, in order: object \to world (Model) \to camera (View) \to clip (Projection) \to NDC (divide) \to screen (Viewport).
The three matrices combine into one \mathrm{MVP} = P \cdot V \cdot M.
It is applied right-to-left with column vectors: \mathrm{MVP}\,\vec{x} = P(V(M\vec{x})), so M acts first and P last.
One matrix per object, run on every vertex; the divide and viewport are fixed-function and finish the job.

The three matrices update on three different clocks, which is exactly why engines keep them separate until the last moment:

Model M — per object. Each mesh has its own placement; a thousand objects means a thousand model matrices, each rebuilt when that object moves.
View V — per camera. One matrix for the whole scene's viewpoint, rebuilt only when the camera moves.
Projection P — per frame (rarely). It changes only when the field of view or aspect ratio does — a window resize, a zoom.

Engines therefore upload V and P once and loop over objects, multiplying in each object's M to form \mathrm{MVP} = P V M. Getting that multiplication order wrong — writing M V P — is the single most common bug in a fresh renderer, and the symptom (a scene that vanishes or smears) is gloriously unhelpful.