Stacking Layers

A neural network is just layers stacked back to back: the outputs of one layer become the inputs of the next. The first layer reads the raw features (the input layer), the last produces the prediction (the output layer), and the layers in between are hidden layers — the network's private workspace.

Each layer applies \vec{a} = \sigma(W\vec{x} + \vec{b}) and passes its result onward. The activation between layers is essential: without it, a hundred layers would collapse into a single linear map. With it, the network can shape almost any function at all.

Anatomy of a network

Here is a small network: three inputs, a hidden layer, and two outputs, fully connected. Step through the layers to follow the signal flowing left to right — input, hidden, output. Each arrow is a weight; each layer is one matrix multiply followed by an activation.

Why hidden layers earn their name

The hidden layers learn their own useful features out of the raw inputs — combinations no human specified. Early layers might detect edges in an image, later ones shapes, later still whole objects. A famous result, the universal approximation theorem, says a network with even one hidden layer can approximate any reasonable function — given enough neurons. To make that promise practical we still need two things: a way to push data through (forward propagation) and a way to learn the weights (backpropagation).