The Sigmoid Function
A line can output any number, from -\infty to
+\infty. But a probability must live between
0 and 1. The sigmoid
function is the squasher that bends the whole number line into that range:
\sigma(z) = \frac{1}{1 + e^{-z}}.
Big positive z gives an output near 1; big
negative z gives near 0; and
z = 0 sits exactly at 0.5. Its graceful
S-shape turns a raw score into a probability.
The S-curve
Here is the sigmoid. The slider sharpens or softens the curve: a large steepness makes it snap
almost like a hard switch, while a small one leaves it gently sloping. Whatever the steepness, it
always stays trapped between 0 and 1 and
passes through 0.5 at the centre.
Why this exact shape
The sigmoid is differentiable everywhere — essential, because
gradient
descent needs a slope to follow. It also has a famously tidy derivative,
\sigma'(z) = \sigma(z)\,(1 - \sigma(z)), which keeps the maths clean.
Feed a linear score through it and you get
logistic
regression; stack it inside a network and it becomes an
activation
function.