A Layer of Neurons

Put several neurons side by side, all reading the same inputs — that's a layer. Each neuron has its own row of weights, so its output is its row dotted with the input. Stack those rows and the whole layer is one matrix–vector product:

\vec{z} = W\vec{x} + \vec{b}, \qquad \vec{a} = \sigma(\vec{z}).

The weight matrix W has one row per neuron and one column per input. So everything you learned about a matrix acting on a vector is exactly what a neural-network layer does — the abstract maths and the practical model are the same object.

Each output is a row · input

Three inputs feed a layer of two neurons. Pick an output neuron and its connections light up: its value is that row of W dotted with the input vector, plus a bias. Two neurons, two dot products — and together they are precisely W\vec{x} + \vec{b}.

Why phrase it as a matrix

Because computers are devastatingly fast at matrix multiplication — it's exactly what GPUs are built for. Writing a layer as W\vec{x} means a whole layer of thousands of neurons runs as a single, blisteringly fast operation, and a whole batch of inputs becomes one matrix–matrix multiply. Linear algebra isn't just the notation for neural networks; it's the reason they're fast enough to be useful.