A Layer of Neurons
Put several neurons side by side, all reading the same inputs — that's a layer.
Each neuron has its own row of weights, so its output is its row dotted with the input. Stack
those rows and the whole layer is one
matrix–vector
product:
\vec{z} = W\vec{x} + \vec{b}, \qquad \vec{a} = \sigma(\vec{z}).
The weight matrix W has one row per neuron and one
column per input. So everything you learned about a matrix acting on a vector is exactly
what a neural-network layer does — the abstract maths and the practical model are the same object.
Each output is a row · input
Three inputs feed a layer of two neurons. Pick an output neuron and its connections light up: its
value is that row of W dotted with the input vector, plus a bias. Two
neurons, two dot products — and together they are precisely W\vec{x} + \vec{b}.
Why phrase it as a matrix
Because computers are devastatingly fast at matrix multiplication — it's exactly what GPUs are
built for. Writing a layer as W\vec{x} means a whole layer of
thousands of neurons runs as a single, blisteringly fast operation, and a whole batch of
inputs becomes one
matrix–matrix
multiply. Linear algebra isn't just the notation for neural networks; it's the
reason they're fast enough to be useful.