The Model Matrix

A mesh is born in its own little world — a teapot centred on its own origin, a character standing at (0,0,0) facing down its own z. That is object space. To drop it into the scene at the right size, facing the right way, in the right place, we need one matrix that does all three. Engines call it the model matrix (or world matrix), and it is built by multiplying a translation, a rotation and a scale into a single 4\times 4.

Building M = T \cdot R \cdot S

Step 1 — decide the order of operations. We want to first scale the mesh to its proper size, then rotate it to face its direction, then translate it to its spot. Scale first, translate last. Written as a chain of functions applied to a vertex \vec{x}, that reads inside-out:

\vec{x} \;\longmapsto\; T\big(R\big(S\,\vec{x}\big)\big).

Step 2 — collapse the chain into one matrix. With column vectors, composing maps is multiplying their matrices, and the matrix nearest the vector acts first. So the three combine into

M = T \cdot R \cdot S.

Step 3 — read the order off the product. The crucial, much-tripped-over fact: with column vectors the rightmost matrix is applied first. Reading M\vec{x} = T(R(S\vec{x})) right-to-left, the vertex is scaled, then rotated, then translated — exactly the order we wanted in Step 1, even though T is written on the left.

M\vec{x} = T\,R\,S\,\vec{x} = \underbrace{T\,(}_{3}\underbrace{R\,(}_{2}\underbrace{S\,\vec{x}}_{1}\,)\,).

One matrix, every vertex

Step 4 — apply M to a vertex. Each homogeneous vertex (x, y, z, 1) of the mesh is multiplied by the same M:

\vec{x}_{\text{world}} = M\,\vec{x}_{\text{object}}.

Step 5 — note what just happened. The map carried the vertex from object space → world space: from the mesh's private coordinates into the shared coordinates of the scene. And here is the leverage — M is built once per object per frame, then reused for every vertex of the mesh:

\vec{v}^{\,(k)}_{\text{world}} = M\,\vec{v}^{\,(k)}_{\text{object}}, \qquad k = 1, 2, \dots, N.

A million-triangle dragon is posed by computing one 4\times 4 and then doing a million identical matrix–vector multiplies — the operation a GPU eats for breakfast. That is why the model matrix is the beating heart of every object's place in the world.

The model (world) matrix that places an object in the scene is:

built as the product M = T \cdot R \cdot S — a scale, then a rotation, then a translation.
applied right-to-left with column vectors: M\vec{x} = T(R(S\vec{x})), so S acts first and T last.
the map from object space to world space, \vec{x}_{\text{world}} = M\,\vec{x}_{\text{object}}.
computed once per object, then used to transform every vertex of the mesh.

An engine almost never stores M directly. It keeps the three human quantities — a position \vec{t}, a rotation R (usually a quaternion), and a scale \vec{s} — because those are what designers and gameplay code want to read and tweak ("move it 2 metres left", "spin it 30°"). Each frame, just before drawing, the engine bakes them into the matrix:

M \;=\; T(\vec{t}) \cdot R \cdot S(\vec{s}).

Edit the friendly triple, rebuild M, hand it to the GPU. And because one object's M can be multiplied onto a child's, whole skeletons and scene graphs are just chains of model matrices — the subject of transform order, where getting the multiplication order wrong is the classic bug.