The Sigmoid Function

A line can output any number, from -\infty to +\infty. But a probability must live between 0 and 1. The sigmoid function is the squasher that bends the whole number line into that range:

\sigma(z) = \frac{1}{1 + e^{-z}}.

Big positive z gives an output near 1; big negative z gives near 0; and z = 0 sits exactly at 0.5. Its graceful S-shape turns a raw score into a probability.

The S-curve

Here is the sigmoid. The slider sharpens or softens the curve: a large steepness makes it snap almost like a hard switch, while a small one leaves it gently sloping. Whatever the steepness, it always stays trapped between 0 and 1 and passes through 0.5 at the centre.

Why this exact shape

The sigmoid is differentiable everywhere — essential, because gradient descent needs a slope to follow. It also has a famously tidy derivative, \sigma'(z) = \sigma(z)\,(1 - \sigma(z)), which keeps the maths clean. Feed a linear score through it and you get logistic regression; stack it inside a network and it becomes an activation function.