Generalized Linear Models
Ordinary
linear regression
fits a straight line and assumes a continuous response with Gaussian noise. Wonderful for predicting
heights or house prices — hopeless for a great many real questions. Will this customer buy?
is a yes/no. How many support calls arrive this hour? is a non-negative count. Fit a straight
line to a 0/1 outcome and it cheerfully predicts a probability of 1.4, or of −0.2. The model is
answering the wrong kind of question.
Generalized linear models (GLMs) are the fix, and one of the most useful frameworks
in applied statistics. They keep the friendly linear predictor
\beta_0+\beta_1 x_1+\cdots but let the response come from any
exponential
family, joined to the linear part by a link function. Logistic
regression and Poisson regression — the workhorses of classification and count modelling — are just
two members of this one family.
The three components of a GLM
Every generalized linear model is built from exactly three pieces:
-
A random component. The response Y is drawn from an
exponential-family distribution (normal, Bernoulli, Poisson, gamma…) with mean
\mu = \mathbb{E}[Y].
-
A linear predictor. The covariates enter only through a linear combination
\eta = \beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p.
-
A link function. A monotone function g ties the mean to
the linear predictor: g(\mu) = \eta, equivalently
\mu = g^{-1}(\eta).
The link is the clever part. It lets the mean live in its natural range — a probability in
(0,1), a rate in (0,\infty) — while the linear
predictor \eta roams freely over all of
\mathbb{R}. Ordinary linear regression is simply the special case with a
Gaussian response and the identity link
g(\mu)=\mu.
The canonical link comes from the exponential family
Which link should you use? Each exponential family has a natural choice — the canonical
link — which is exactly the map from the mean to the natural parameter
\eta we met when writing the family in canonical form. That is why the
canonical links are the log-odds and the log:
- Logistic regression — Bernoulli response, logit link g(\mu)=\log\frac{\mu}{1-\mu}=\eta, so \mu=\dfrac{1}{1+e^{-\eta}} stays in (0,1).
- Poisson regression — count response, log link g(\mu)=\log\mu=\eta, so \mu=e^{\eta} stays positive.
- Both are fit by maximum likelihood (there's no closed form; software uses iteratively reweighted least squares).
The logistic sigmoid is precisely the Bernoulli mean function
A'(\eta) from the exponential-family page — the theory of the previous page
is the machinery of this one.
Worked example 1 — logistic regression and log-odds
Model the probability a loan defaults as
\log\dfrac{p}{1-p} = \beta_0 + \beta_1 x, where
x is (say) debt-to-income ratio. The left side is the
log-odds, so a one-unit rise in x adds
\beta_1 to the log-odds — equivalently it multiplies the odds by
e^{\beta_1}. If \beta_1 = 0.7, each extra unit of
x multiplies the odds of default by
e^{0.7}\approx 2.0 — it doubles the odds. To get an actual probability,
push the linear predictor through the sigmoid:
p = 1/(1+e^{-(\beta_0+\beta_1 x)}), which is guaranteed to land in
(0,1).
Worked example 2 — Poisson regression for counts
Model the number of website visits per hour as
\log\mu = \beta_0 + \beta_1 x, with x an
advertising-spend index. Because the link is the log, the mean is
\mu = e^{\beta_0+\beta_1 x} — always positive, as a count rate must be.
Coefficients act multiplicatively on the rate: a one-unit rise in x
multiplies the expected count by e^{\beta_1}. With
\beta_1=\log 2\approx 0.69, every extra unit of spend doubles
the expected traffic. This "log-linear" reading — effects that scale rather than add — is exactly what
you want for counts, and it falls out automatically from the log link.
The logistic curve in motion
This is a logistic regression fit: the predicted probability
p(x)=1/(1+e^{-(\beta_0+\beta_1 x)}). Slide the intercept
\beta_0 to shift the curve left or right (moving the point where
p=0.5), and the slope \beta_1 to make the
transition gentle or steep — a negative \beta_1 flips it to decrease. However
you push the sliders, the curve never leaves (0,1). That is
the whole point of the link, and exactly what a straight line could not promise.
The tempting mistake is to code the response as 0/1 and run ordinary linear regression — the "linear
probability model." It breaks in two ways. First, the fitted line is unbounded, so for extreme
x it predicts probabilities above 1 or below 0, which are
meaningless. Second, a 0/1 outcome has variance p(1-p) that depends on the
mean, flatly violating the constant-variance assumption ordinary regression is built on. Logistic
regression cures both at a stroke: the sigmoid keeps predictions inside
(0,1), and the Bernoulli random component gets the variance right. If your
response is a category or a count, reach for the matching GLM — not a straight line.
Because they live on the link scale, not the probability scale. A coefficient
\beta_1=0.5 does not mean "probability rises by 0.5 per unit"; it
means the log-odds rise by 0.5, so the odds multiply by
e^{0.5}\approx 1.65. The effect on the actual probability depends on where
you are on the curve — near p=0.5 the sigmoid is steep and a unit of
x moves the probability a lot; out in the flat tails it barely nudges it.
This is why practitioners quote odds ratios (e^{\beta}):
they are constant across the curve and easy to reason about. The same warning holds for Poisson
regression, where e^{\beta} is a rate ratio. Read GLM coefficients
through the link, and they stop being mysterious.