Generalized Linear Models

Ordinary linear regression fits a straight line and assumes a continuous response with Gaussian noise. Wonderful for predicting heights or house prices — hopeless for a great many real questions. Will this customer buy? is a yes/no. How many support calls arrive this hour? is a non-negative count. Fit a straight line to a 0/1 outcome and it cheerfully predicts a probability of 1.4, or of −0.2. The model is answering the wrong kind of question.

Generalized linear models (GLMs) are the fix, and one of the most useful frameworks in applied statistics. They keep the friendly linear predictor \beta_0+\beta_1 x_1+\cdots but let the response come from any exponential family, joined to the linear part by a link function. Logistic regression and Poisson regression — the workhorses of classification and count modelling — are just two members of this one family.

The three components of a GLM

Every generalized linear model is built from exactly three pieces:

A random component. The response Y is drawn from an exponential-family distribution (normal, Bernoulli, Poisson, gamma…) with mean \mu = \mathbb{E}[Y].
A linear predictor. The covariates enter only through a linear combination \eta = \beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p.
A link function. A monotone function g ties the mean to the linear predictor: g(\mu) = \eta, equivalently \mu = g^{-1}(\eta).

The link is the clever part. It lets the mean live in its natural range — a probability in (0,1), a rate in (0,\infty) — while the linear predictor \eta roams freely over all of \mathbb{R}. Ordinary linear regression is simply the special case with a Gaussian response and the identity link g(\mu)=\mu.

The canonical link comes from the exponential family

Which link should you use? Each exponential family has a natural choice — the canonical link — which is exactly the map from the mean to the natural parameter \eta we met when writing the family in canonical form. That is why the canonical links are the log-odds and the log:

Logistic regression — Bernoulli response, logit link g(\mu)=\log\frac{\mu}{1-\mu}=\eta, so \mu=\dfrac{1}{1+e^{-\eta}} stays in (0,1).
Poisson regression — count response, log link g(\mu)=\log\mu=\eta, so \mu=e^{\eta} stays positive.
Both are fit by maximum likelihood (there's no closed form; software uses iteratively reweighted least squares).

The logistic sigmoid is precisely the Bernoulli mean function A'(\eta) from the exponential-family page — the theory of the previous page is the machinery of this one.

Worked example 1 — logistic regression and log-odds

Model the probability a loan defaults as \log\dfrac{p}{1-p} = \beta_0 + \beta_1 x, where x is (say) debt-to-income ratio. The left side is the log-odds, so a one-unit rise in x adds \beta_1 to the log-odds — equivalently it multiplies the odds by e^{\beta_1}. If \beta_1 = 0.7, each extra unit of x multiplies the odds of default by e^{0.7}\approx 2.0 — it doubles the odds. To get an actual probability, push the linear predictor through the sigmoid: p = 1/(1+e^{-(\beta_0+\beta_1 x)}), which is guaranteed to land in (0,1).

Worked example 2 — Poisson regression for counts

Model the number of website visits per hour as \log\mu = \beta_0 + \beta_1 x, with x an advertising-spend index. Because the link is the log, the mean is \mu = e^{\beta_0+\beta_1 x} — always positive, as a count rate must be. Coefficients act multiplicatively on the rate: a one-unit rise in x multiplies the expected count by e^{\beta_1}. With \beta_1=\log 2\approx 0.69, every extra unit of spend doubles the expected traffic. This "log-linear" reading — effects that scale rather than add — is exactly what you want for counts, and it falls out automatically from the log link.

The logistic curve in motion

This is a logistic regression fit: the predicted probability p(x)=1/(1+e^{-(\beta_0+\beta_1 x)}). Slide the intercept \beta_0 to shift the curve left or right (moving the point where p=0.5), and the slope \beta_1 to make the transition gentle or steep — a negative \beta_1 flips it to decrease. However you push the sliders, the curve never leaves (0,1). That is the whole point of the link, and exactly what a straight line could not promise.

The tempting mistake is to code the response as 0/1 and run ordinary linear regression — the "linear probability model." It breaks in two ways. First, the fitted line is unbounded, so for extreme x it predicts probabilities above 1 or below 0, which are meaningless. Second, a 0/1 outcome has variance p(1-p) that depends on the mean, flatly violating the constant-variance assumption ordinary regression is built on. Logistic regression cures both at a stroke: the sigmoid keeps predictions inside (0,1), and the Bernoulli random component gets the variance right. If your response is a category or a count, reach for the matching GLM — not a straight line.

Because they live on the link scale, not the probability scale. A coefficient \beta_1=0.5 does not mean "probability rises by 0.5 per unit"; it means the log-odds rise by 0.5, so the odds multiply by e^{0.5}\approx 1.65. The effect on the actual probability depends on where you are on the curve — near p=0.5 the sigmoid is steep and a unit of x moves the probability a lot; out in the flat tails it barely nudges it. This is why practitioners quote odds ratios (e^{\beta}): they are constant across the curve and easy to reason about. The same warning holds for Poisson regression, where e^{\beta} is a rate ratio. Read GLM coefficients through the link, and they stop being mysterious.