Inverse Transforms

An object's model matrix M takes a vertex from the object's own space out into the world: it is how the engine plants the mesh in the level. Constantly you need to go the other way — world back into object space. "Where did the player click, in the chair's local frame?" "Is this bullet inside my crate's box?" The matrix that runs the model matrix backwards is the inverse, M^{-1}:

\vec{v}_{\text{object}} = M^{-1}\,\vec{v}_{\text{world}}, \qquad M^{-1} M = I.

Computing a general 4\times 4 inverse is a chore. The happy news is that the transforms games care about most — rigid moves of solid bodies — have an inverse you can write down by hand, no row-reduction required. Let us derive it.

The cheap inverse of a rigid transform

A rigid transform is a rotation followed by a translation — it moves a body without bending or scaling it, exactly what a camera or a solid prop undergoes. Acting on a point \vec{x} it does

M\vec{x} = R\vec{x} + \vec{t},

where R is a rotation and \vec{t} a translation. We want the \vec{y} \mapsto \vec{x} map that undoes it.

Step 1 — start from the forward map and solve for the input. Set \vec{y} = R\vec{x} + \vec{t} and isolate \vec{x}. Subtract the translation:

\vec{y} - \vec{t} = R\vec{x}.

Step 2 — undo the rotation. Multiply on the left by R^{-1}:

\vec{x} = R^{-1}(\vec{y} - \vec{t}) = R^{-1}\vec{y} - R^{-1}\vec{t}.

Step 3 — replace the rotation inverse with a transpose. Here is the whole point. A rotation is orthogonal: its columns are orthonormal, so R^{\top} R = I, which means R^{-1} = R^{\top}. Inverting a rotation is just transposing it — flip it across the diagonal, no division, no determinant:

\vec{x} = R^{\top}\vec{y} - R^{\top}\vec{t}.

Step 4 — read off the inverse transform. Compare \vec{x} = R^{\top}\vec{y} + (-R^{\top}\vec{t}) with the rigid form R'\vec{y} + \vec{t}'. The inverse is itself rigid, with rotation R^{\top} and translation -R^{\top}\vec{t}:

M^{-1} = \big(\,R^{\top},\; -R^{\top}\vec{t}\,\big).

Step 5 — read it as a recipe. "Rotate back, then shift back." Strip off the translation by rotating the world by R^{\top}, and the leftover offset -R^{\top}\vec{t} slides the origin home. A transpose and a matrix–vector product — pennies, compared with a general inverse.

Step 6 — name the most important instance: the view matrix. A camera is a rigid body too. Its world transform M_{\text{cam}} = (R, \vec{t}) says where the camera is. Rendering needs the opposite — every object expressed in the camera's frame — so the view matrix is precisely the camera's inverse:

V = M_{\text{cam}}^{-1} = \big(\,R^{\top},\; -R^{\top}\vec{t}\,\big).

That is why moving the camera right shoves the whole world left: the view matrix is the camera's world transform, run backwards by Step 4.

Let M\vec{x} = R\vec{x} + \vec{t} be a rigid (rotation + translation) transform with R orthogonal.

You have a crate sitting in the world at some jaunty angle, and a bullet at world position \vec{p}. Testing "is \vec{p} inside this rotated box?" against the tilted faces is fiddly trigonometry. The slick move is to stop fighting the rotation: pull the point into the crate's own frame with the inverse,

\vec{p}_{\text{local}} = M^{-1}\vec{p} = R^{\top}(\vec{p} - \vec{t}),

and now the box is axis-aligned — the test collapses to three trivial comparisons, |x| \le w, |y| \le h, |z| \le d. World \to local is the standard trick for collision, picking, and "what am I looking at?" ray queries: transform the hard problem into the frame where it's easy, using an inverse that, for a rigid body, costs almost nothing.

There and back again

A single point starts at the object origin (the small square). Drag apply to push it out into the world with M — rotate by the angle, then translate by \vec{t}. Drag inverse and watch M^{-1} bring it straight back to the origin: rotate by R^{\top}, then shift by -R^{\top}\vec{t}. The round trip lands exactly where it began — that is M^{-1}M = I, made visible. Spin the rotation slider to confirm it works at any angle.