A Matrix Times a Vector

Here is the operation everything has been building towards. A matrix multiplies a vector to produce a new vector. The mechanical rule is "rows dot the vector":

\begin{bmatrix} a & b \\ c & d \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} ax + by \\ cx + dy \end{bmatrix}.

That works, but it hides the meaning. The real story is this: the output is a linear combination of the matrix's columns, weighted by the entries of the vector:

A\vec{x} = x\begin{bmatrix} a \\ c \end{bmatrix} + y\begin{bmatrix} b \\ d \end{bmatrix}.

Columns, weighted by the vector

The two faint arrows are the columns of A. The sliders are the entries x and y of the input vector — the weights. The bold arrow is the output A\vec{x}: column one scaled by x, plus column two scaled by y. This single idea — "a matrix mixes its columns" — is the one to carry into everything ahead.

Why this is the whole game

Reading A\vec{x} as "weighted columns" explains why matrices are the language of so much. A neuron computes exactly a row dotted with its inputs; a whole layer is one matrix–vector product. And in the next stage we'll see A\vec{x} as a transformation — a matrix is a verb that moves every vector at once.