An object's model matrix M takes a vertex
from the object's own space out into the world: it is how the engine plants the mesh in the
level. Constantly you need to go the other way — world back into object space. "Where
did the player click, in the chair's local frame?" "Is this bullet inside my crate's box?" The
matrix that runs the model matrix backwards is the
inverse,
M^{-1}:
\vec{v}_{\text{object}} = M^{-1}\,\vec{v}_{\text{world}}, \qquad M^{-1} M = I.
Computing a general 4\times 4 inverse is a chore. The happy news is
that the transforms games care about most — rigid moves of solid bodies — have an inverse you
can write down by hand, no row-reduction required. Let us derive it.
The cheap inverse of a rigid transform
A rigid transform is a rotation followed by a translation — it moves a body
without bending or scaling it, exactly what a camera or a solid prop undergoes. Acting on a
point \vec{x} it does
M\vec{x} = R\vec{x} + \vec{t},
where R is a rotation and \vec{t} a
translation. We want the \vec{y} \mapsto \vec{x} map that undoes it.
Step 1 — start from the forward map and solve for the input. Set
\vec{y} = R\vec{x} + \vec{t} and isolate \vec{x}.
Subtract the translation:
\vec{y} - \vec{t} = R\vec{x}.
Step 2 — undo the rotation. Multiply on the left by
R^{-1}:
\vec{x} = R^{-1}(\vec{y} - \vec{t}) = R^{-1}\vec{y} - R^{-1}\vec{t}.
Step 3 — replace the rotation inverse with a transpose. Here is the whole
point. A rotation is orthogonal: its columns are orthonormal, so
R^{\top} R = I, which means
R^{-1} = R^{\top}. Inverting a rotation is just
transposing
it — flip it across the diagonal, no division, no determinant:
\vec{x} = R^{\top}\vec{y} - R^{\top}\vec{t}.
Step 4 — read off the inverse transform. Compare
\vec{x} = R^{\top}\vec{y} + (-R^{\top}\vec{t}) with the rigid form
R'\vec{y} + \vec{t}'. The inverse is itself rigid, with rotation
R^{\top} and translation -R^{\top}\vec{t}:
M^{-1} = \big(\,R^{\top},\; -R^{\top}\vec{t}\,\big).
Step 5 — read it as a recipe. "Rotate back, then shift back." Strip off the
translation by rotating the world by R^{\top}, and the leftover
offset -R^{\top}\vec{t} slides the origin home. A transpose and a
matrix–vector product — pennies, compared with a general inverse.
Step 6 — name the most important instance: the view matrix. A camera is a
rigid body too. Its world transform M_{\text{cam}} = (R, \vec{t})
says where the camera is. Rendering needs the opposite — every object expressed in the
camera's frame — so the view matrix is precisely the camera's inverse:
V = M_{\text{cam}}^{-1} = \big(\,R^{\top},\; -R^{\top}\vec{t}\,\big).
That is why moving the camera right shoves the whole world left: the view matrix is the camera's
world transform, run backwards by Step 4.
Let M\vec{x} = R\vec{x} + \vec{t} be a rigid (rotation + translation)
transform with R orthogonal.
-
The inverse maps world coordinates back to object coordinates:
\vec{v}_{\text{object}} = M^{-1}\vec{v}_{\text{world}}.
-
Because a rotation is orthogonal, R^{-1} = R^{\top} — no general
inverse is needed.
-
The inverse is itself rigid:
M^{-1} = (\,R^{\top},\, -R^{\top}\vec{t}\,) — rotate back, then
shift back.
-
The camera's view matrix is exactly the inverse of the camera's world
transform, V = M_{\text{cam}}^{-1}.
You have a crate sitting in the world at some jaunty angle, and a bullet at world position
\vec{p}. Testing "is \vec{p} inside this
rotated box?" against the tilted faces is fiddly trigonometry. The slick move is to stop
fighting the rotation: pull the point into the crate's own frame with the inverse,
\vec{p}_{\text{local}} = M^{-1}\vec{p} = R^{\top}(\vec{p} - \vec{t}),
and now the box is axis-aligned — the test collapses to three trivial comparisons,
|x| \le w, |y| \le h,
|z| \le d. World \to local is the
standard trick for collision, picking, and "what am I looking at?" ray queries: transform the
hard problem into the frame where it's easy, using an inverse that, for a rigid body, costs
almost nothing.