Forward Propagation

Forward propagation is how a network turns an input into a prediction: feed the feature vector into the first layer, pass its output to the next, and so on to the end. Each layer is a matrix multiply followed by an activation:

\vec{a}^{(1)} = \sigma(W_1\vec{x} + \vec{b}_1), \quad \vec{a}^{(2)} = \sigma(W_2\vec{a}^{(1)} + \vec{b}_2), \quad \dots

The output of one layer is the input of the next — just function composition. The whole network is one big function built by chaining these matrix multiplies and squashes together.

Push the signal through

Step the input forward, stage by stage. The 2-number input becomes a 3-number hidden activation (via W_1), which becomes the final output (via W_2). Each stage is a matrix times the previous vector, squashed by the activation — the same operation, repeated.

Fast, but only half the story

Forward propagation is cheap and parallel — a few matrix multiplies — which is why a trained network can label an image in milliseconds. Run a whole batch of inputs at once and it becomes a single matrix–matrix multiply. But this only uses a network; it doesn't teach it. For that we need to measure the error and send it backwards — first by mapping out the loss landscape, then by backpropagation.