The Model Matrix
A mesh is born in its own little world — a teapot centred on its own origin, a character
standing at (0,0,0) facing down its own z.
That is object space. To drop it into the scene at the right size, facing the
right way, in the right place, we need one matrix that does all three. Engines call it the
model matrix (or world matrix), and it is built by
multiplying
a translation, a rotation and a scale into a single
4\times 4.
Building M = T \cdot R \cdot S
Step 1 — decide the order of operations. We want to first scale
the mesh to its proper size, then rotate it to face its direction, then
translate it to its spot. Scale first, translate last. Written as a chain of
functions applied to a vertex \vec{x}, that reads inside-out:
\vec{x} \;\longmapsto\; T\big(R\big(S\,\vec{x}\big)\big).
Step 2 — collapse the chain into one matrix. With
column vectors,
composing maps is multiplying their matrices, and the matrix nearest the vector acts first. So
the three combine into
M = T \cdot R \cdot S.
Step 3 — read the order off the product. The crucial, much-tripped-over fact:
with column vectors the rightmost matrix is applied first.
Reading M\vec{x} = T(R(S\vec{x})) right-to-left, the vertex is scaled,
then rotated, then translated — exactly the order we wanted in Step 1, even though
T is written on the left.
M\vec{x} = T\,R\,S\,\vec{x} = \underbrace{T\,(}_{3}\underbrace{R\,(}_{2}\underbrace{S\,\vec{x}}_{1}\,)\,).
One matrix, every vertex
Step 4 — apply M to a vertex. Each homogeneous
vertex (x, y, z, 1) of the mesh is multiplied by the same
M:
\vec{x}_{\text{world}} = M\,\vec{x}_{\text{object}}.
Step 5 — note what just happened. The map carried the vertex from
object space → world space: from the mesh's private coordinates into the shared
coordinates of the scene. And here is the leverage —
M is built once per object per frame, then reused for
every vertex of the mesh:
\vec{v}^{\,(k)}_{\text{world}} = M\,\vec{v}^{\,(k)}_{\text{object}}, \qquad k = 1, 2, \dots, N.
A million-triangle dragon is posed by computing one 4\times 4 and
then doing a million identical matrix–vector multiplies — the operation a GPU eats for breakfast.
That is why the model matrix is the beating heart of every object's place in the world.
The model (world) matrix that places an object in the scene is:
-
built as the product M = T \cdot R \cdot S — a
scale, then a rotation, then a translation.
-
applied right-to-left with column vectors:
M\vec{x} = T(R(S\vec{x})), so S acts
first and T last.
-
the map from object space to world space,
\vec{x}_{\text{world}} = M\,\vec{x}_{\text{object}}.
-
computed once per object, then used to transform every vertex
of the mesh.
An engine almost never stores M directly. It keeps the three human
quantities — a position \vec{t}, a rotation
R (usually a quaternion), and a scale
\vec{s} — because those are what designers and gameplay code want to
read and tweak ("move it 2 metres left", "spin it 30°"). Each frame, just before drawing, the
engine bakes them into the matrix:
M \;=\; T(\vec{t}) \cdot R \cdot S(\vec{s}).
Edit the friendly triple, rebuild M, hand it to the GPU. And because
one object's M can be multiplied onto a child's, whole skeletons and
scene graphs are just chains of model matrices — the subject of
transform order,
where getting the multiplication order wrong is the classic bug.