A
linear transformation
can rotate, scale and shear, but it always pins the origin in place. An affine
transformation is the natural upgrade: a linear map followed by a
translation. It is everything linear maps can do, plus the ability to slide.
\vec{x} \;\longmapsto\; M\vec{x} + \vec{t}.
Here M is the linear part (a rotation, scaling, shear or any
combination) and \vec{t} is the translation. Using
homogeneous coordinates,
this two-step recipe collapses into a single matrix multiply.
One matrix for "rotate then slide"
Step 1 — assemble the homogeneous matrix. Place the linear part
M in the top-left block, the translation
\vec{t} in the last column, and a bottom row
[\,0 \; \cdots \; 0 \; 1\,]. In 2-D (so a
3\times 3 matrix):
A = \begin{bmatrix} m_{11} & m_{12} & t_x \\ m_{21} & m_{22} & t_y \\ 0 & 0 & 1 \end{bmatrix} = \left[\begin{array}{c|c} M & \vec{t} \\ \hline \vec{0}^{\top} & 1 \end{array}\right].
Step 2 — apply it to the homogeneous point
(x, y, 1).
A \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = \begin{bmatrix} m_{11} & m_{12} & t_x \\ m_{21} & m_{22} & t_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}.
Step 3 — multiply it out row by row.
= \begin{bmatrix} m_{11}x + m_{12}y + t_x \\ m_{21}x + m_{22}y + t_y \\ 1 \end{bmatrix}.
Step 4 — read off the top two rows. They are exactly
M\vec{x} plus \vec{t}; the bottom row
stays 1, so the output is still a point:
\begin{bmatrix} m_{11}x + m_{12}y \\ m_{21}x + m_{22}y \end{bmatrix} + \begin{bmatrix} t_x \\ t_y \end{bmatrix} = M\vec{x} + \vec{t}.
The single matrix A performs the entire affine map. The linear
block bends and the translation column slides, all in one
matrix–vector multiply.
Composing affine maps = multiplying matrices
Step 5 — chain two affine maps the long way. Apply
A_1(\vec{x}) = M_1\vec{x} + \vec{t}_1, then
A_2(\vec{y}) = M_2\vec{y} + \vec{t}_2:
A_2(A_1(\vec{x})) = M_2(M_1\vec{x} + \vec{t}_1) + \vec{t}_2 = (M_2 M_1)\vec{x} + (M_2 \vec{t}_1 + \vec{t}_2).
Step 6 — recognise the result as one affine map. The composite has linear
part M_2 M_1 and translation
M_2\vec{t}_1 + \vec{t}_2 — itself of the form
M\vec{x} + \vec{t}. The bookkeeping is fiddly by hand.
Step 7 — let the matrices do it. In homogeneous form,
composition is just the matrix product
A_2 A_1, and multiplying the blocks reproduces Step 6
automatically:
\left[\begin{array}{c|c} M_2 & \vec{t}_2 \\ \hline \vec{0}^{\top} & 1 \end{array}\right] \left[\begin{array}{c|c} M_1 & \vec{t}_1 \\ \hline \vec{0}^{\top} & 1 \end{array}\right] = \left[\begin{array}{c|c} M_2 M_1 & M_2\vec{t}_1 + \vec{t}_2 \\ \hline \vec{0}^{\top} & 1 \end{array}\right].
So a whole chain — rotate, then scale, then translate — collapses into one
matrix. A
rotation,
a scaling and a slide multiply together once, and from then on every point of your shape is
transformed by a single multiply. That single fact is why graphics hardware is built around
matrix products.
An affine map sends \vec{x} \mapsto M\vec{x} + \vec{t}. Then:
-
It is a linear map followed by a translation — every linear map is affine
(with \vec{t} = \vec{0}), but not conversely.
-
In homogeneous coordinates it is the single matrix
A = \left[\begin{smallmatrix} M & \vec{t} \\ \vec{0}^{\top} & 1 \end{smallmatrix}\right]
acting on (\vec{x}, 1).
-
Composition is matrix multiplication:
A_2 A_1 is again affine, with linear part
M_2 M_1 and translation
M_2\vec{t}_1 + \vec{t}_2.
-
It preserves lines and parallelism (and ratios of lengths along a line) —
straight stays straight, parallel stays parallel.
In 3-D, an affine map is a 4\times 4 matrix: a
3\times 3 linear block, a translation column, and the bottom row
(0, 0, 0, 1). Graphics engines call the one that places an object
in the world its model matrix:
A = \left[\begin{array}{c|c} M_{3\times 3} & \vec{t} \\ \hline \vec{0}^{\top} & 1 \end{array}\right].
A character is modelled around its own origin, then a single model matrix — built by
multiplying together a scale, a rotation and a translation — drops it into the scene at the
right size, facing the right way, in the right place. Multiply that by the camera's
view matrix and the projection matrix (the perspective divide from the
previous lesson), and a vertex's entire journey from model space to the screen is
\vec{v}_{\text{screen}} = P \cdot V \cdot M \cdot \vec{v}_{\text{model}}.
Three matrices, one multiply per vertex, millions of vertices a frame — the whole modern
graphics pipeline rests on affine maps wearing their homogeneous disguise.