A neural network is just layers stacked back to back: the outputs of one
Each layer applies
Here is a small network: three inputs, a hidden layer, and two outputs, fully connected. Step through the layers to follow the signal flowing left to right — input, hidden, output. Each arrow is a weight; each layer is one matrix multiply followed by an activation.
The hidden layers learn their own useful features out of the raw inputs — combinations no
human specified. Early layers might detect edges in an image, later ones shapes, later still whole
objects. A famous result, the universal approximation theorem, says a network with
even one hidden layer can approximate any reasonable function — given enough neurons. To make that
promise practical we still need two things: a way to push data through
(