Forward propagation is how a network turns an input into a prediction: feed the
feature vector into the first layer, pass its output to the next, and so on to the end. Each layer
is a
The output of one layer is the input of the next — just function composition. The whole network is one big function built by chaining these matrix multiplies and squashes together.
Step the input forward, stage by stage. The 2-number input becomes a 3-number hidden activation
(via
Forward propagation is cheap and parallel — a few matrix multiplies — which is why a trained
network can label an image in milliseconds. Run a whole batch of inputs at once and it
becomes a single matrix–matrix multiply. But this only uses a network; it doesn't teach
it. For that we need to measure the error and send it backwards — first by mapping out the