Affine Transformations

A linear transformation can rotate, scale and shear, but it always pins the origin in place. An affine transformation is the natural upgrade: a linear map followed by a translation. It is everything linear maps can do, plus the ability to slide.

\vec{x} \;\longmapsto\; M\vec{x} + \vec{t}.

Here M is the linear part (a rotation, scaling, shear or any combination) and \vec{t} is the translation. Using homogeneous coordinates, this two-step recipe collapses into a single matrix multiply.

One matrix for "rotate then slide"

Step 1 — assemble the homogeneous matrix. Place the linear part M in the top-left block, the translation \vec{t} in the last column, and a bottom row [\,0 \; \cdots \; 0 \; 1\,]. In 2-D (so a 3\times 3 matrix):

A = \begin{bmatrix} m_{11} & m_{12} & t_x \\ m_{21} & m_{22} & t_y \\ 0 & 0 & 1 \end{bmatrix} = \left[\begin{array}{c|c} M & \vec{t} \\ \hline \vec{0}^{\top} & 1 \end{array}\right].

Step 2 — apply it to the homogeneous point (x, y, 1).

A \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = \begin{bmatrix} m_{11} & m_{12} & t_x \\ m_{21} & m_{22} & t_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}.

Step 3 — multiply it out row by row.

= \begin{bmatrix} m_{11}x + m_{12}y + t_x \\ m_{21}x + m_{22}y + t_y \\ 1 \end{bmatrix}.

Step 4 — read off the top two rows. They are exactly M\vec{x} plus \vec{t}; the bottom row stays 1, so the output is still a point:

\begin{bmatrix} m_{11}x + m_{12}y \\ m_{21}x + m_{22}y \end{bmatrix} + \begin{bmatrix} t_x \\ t_y \end{bmatrix} = M\vec{x} + \vec{t}.

The single matrix A performs the entire affine map. The linear block bends and the translation column slides, all in one matrix–vector multiply.

Composing affine maps = multiplying matrices

Step 5 — chain two affine maps the long way. Apply A_1(\vec{x}) = M_1\vec{x} + \vec{t}_1, then A_2(\vec{y}) = M_2\vec{y} + \vec{t}_2:

A_2(A_1(\vec{x})) = M_2(M_1\vec{x} + \vec{t}_1) + \vec{t}_2 = (M_2 M_1)\vec{x} + (M_2 \vec{t}_1 + \vec{t}_2).

Step 6 — recognise the result as one affine map. The composite has linear part M_2 M_1 and translation M_2\vec{t}_1 + \vec{t}_2 — itself of the form M\vec{x} + \vec{t}. The bookkeeping is fiddly by hand.

Step 7 — let the matrices do it. In homogeneous form, composition is just the matrix product A_2 A_1, and multiplying the blocks reproduces Step 6 automatically:

\left[\begin{array}{c|c} M_2 & \vec{t}_2 \\ \hline \vec{0}^{\top} & 1 \end{array}\right] \left[\begin{array}{c|c} M_1 & \vec{t}_1 \\ \hline \vec{0}^{\top} & 1 \end{array}\right] = \left[\begin{array}{c|c} M_2 M_1 & M_2\vec{t}_1 + \vec{t}_2 \\ \hline \vec{0}^{\top} & 1 \end{array}\right].

So a whole chain — rotate, then scale, then translate — collapses into one matrix. A rotation, a scaling and a slide multiply together once, and from then on every point of your shape is transformed by a single multiply. That single fact is why graphics hardware is built around matrix products.

An affine map sends \vec{x} \mapsto M\vec{x} + \vec{t}. Then:

It is a linear map followed by a translation — every linear map is affine (with \vec{t} = \vec{0}), but not conversely.
In homogeneous coordinates it is the single matrix A = \left[\begin{smallmatrix} M & \vec{t} \\ \vec{0}^{\top} & 1 \end{smallmatrix}\right] acting on (\vec{x}, 1).
Composition is matrix multiplication: A_2 A_1 is again affine, with linear part M_2 M_1 and translation M_2\vec{t}_1 + \vec{t}_2.
It preserves lines and parallelism (and ratios of lengths along a line) — straight stays straight, parallel stays parallel.

In 3-D, an affine map is a 4\times 4 matrix: a 3\times 3 linear block, a translation column, and the bottom row (0, 0, 0, 1). Graphics engines call the one that places an object in the world its model matrix:

A = \left[\begin{array}{c|c} M_{3\times 3} & \vec{t} \\ \hline \vec{0}^{\top} & 1 \end{array}\right].

A character is modelled around its own origin, then a single model matrix — built by multiplying together a scale, a rotation and a translation — drops it into the scene at the right size, facing the right way, in the right place. Multiply that by the camera's view matrix and the projection matrix (the perspective divide from the previous lesson), and a vertex's entire journey from model space to the screen is

\vec{v}_{\text{screen}} = P \cdot V \cdot M \cdot \vec{v}_{\text{model}}.

Three matrices, one multiply per vertex, millions of vertices a frame — the whole modern graphics pipeline rests on affine maps wearing their homogeneous disguise.