A Matrix Times a Vector
Here is the operation everything has been building towards. A matrix
multiplies a vector to produce a new vector. The mechanical rule is "rows dot
the vector":
\begin{bmatrix} a & b \\ c & d \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} ax + by \\ cx + dy \end{bmatrix}.
That works, but it hides the meaning. The real story is this: the output is a
linear combination of
the matrix's columns, weighted by the entries of the vector:
A\vec{x} = x\begin{bmatrix} a \\ c \end{bmatrix} + y\begin{bmatrix} b \\ d \end{bmatrix}.
Columns, weighted by the vector
The two faint arrows are the columns of A. The sliders are the
entries x and y of the input vector — the
weights. The bold arrow is the output A\vec{x}: column one
scaled by x, plus column two scaled by y.
This single idea — "a matrix mixes its columns" — is the one to carry into everything ahead.
Why this is the whole game
Reading A\vec{x} as "weighted columns" explains why matrices are the
language of so much. A
neuron
computes exactly a row dotted with its inputs; a whole
layer
is one matrix–vector product. And in the next stage we'll see
A\vec{x} as a transformation — a matrix is a verb
that moves every vector at once.