Convex Sets and Functions

Optimization is hard for one reason above all others: a landscape can hide many local minima, and standing in one you cannot see whether a deeper valley lies beyond the ridge. Convexity is the magic property that makes this fear vanish. In a convex problem there is only one valley — so any minimum you find is the minimum. Half of optimization is really about recognising, or engineering, convexity.

Two objects wear the name. A convex set is a region with no dents: pick any two points in it and the straight segment joining them stays entirely inside.

A set C \subseteq \mathbb{R}^n is convex if for all \mathbf{x}, \mathbf{y} \in C and every \lambda \in [0, 1], \lambda\mathbf{x} + (1-\lambda)\mathbf{y} \in C. The expression \lambda\mathbf{x} + (1-\lambda)\mathbf{y} is a convex combination — it sweeps out the whole line segment from \mathbf{y} (at \lambda = 0) to \mathbf{x} (at \lambda = 1).

A disk, a filled triangle, a half-plane, a line, all of \mathbb{R}^n — all convex. A crescent moon, a doughnut, an L-shape, a pair of separate blobs — not convex, because some chord pokes outside. Crucially, the intersection of convex sets is convex, which is why a feasible region cut out by many linear "\le" constraints (each a convex half-space) is always a convex polytope.

Convex functions: the chord lies above

A convex function curves upward like a bowl: the straight chord joining any two points of its graph never dips below the graph between them.

f is convex on a convex domain if for all x, y and \lambda \in [0, 1], f\bigl(\lambda x + (1-\lambda) y\bigr) \le \lambda f(x) + (1-\lambda) f(y). The left side is the function on the chord's midpoint parameter; the right side is the chord itself. If the inequality is always strict for x \ne y and 0 < \lambda < 1, f is strictly convex. Flip the inequality (\ge) and you have a concave function — a dome, the negative of a convex one.

Drag the slider to slide a probe point along the interval, and switch between a convex bowl and a wiggly non-convex curve. For the bowl the chord (its colour) stays above the graph everywhere — the gap is never negative. For the wiggle you can park the probe where the chord cuts below the curve: a single such spot is enough to disqualify convexity.

The second-derivative test

Checking every chord is impractical. For a twice-differentiable function there is a pointwise test: curvature must never turn downward.

A twice-differentiable f is convex on an interval iff f''(x) \ge 0 there (curvature never negative). In several variables, replace f'' by the Hessian \nabla^2 f: f is convex iff the Hessian is positive semidefinite everywhere on the domain.

Examples. f(x) = x^2 has f'' = 2 > 0 — strictly convex. e^x has f'' = e^x > 0 — convex. -\ln x has f'' = 1/x^2 > 0 — convex on x > 0. But x^3 has f'' = 6x, which is negative for x < 0: convex only on [0, \infty), not on the whole line.

A convex function has a wonderfully useful cousin: its epigraph — the set of all points on or above the graph, \{(x, t) : t \ge f(x)\} — is a convex set. So "convex function" and "convex set" are two faces of one idea: a function is convex exactly when the region above it is.

Why convexity guarantees a global optimum

Here is the payoff, and the short proof is worth savouring. Suppose f is convex and x^\star is a local minimum. Could a strictly lower point y exist somewhere far off, with f(y) < f(x^\star)? Walk from x^\star toward y along the segment. Convexity says every point on that segment satisfies

f\bigl(\lambda x^\star + (1-\lambda) y\bigr) \le \lambda f(x^\star) + (1-\lambda) f(y) < f(x^\star)

for every \lambda \in [0, 1) — so points arbitrarily close to x^\star already beat it. That contradicts x^\star being a local minimum. Hence no lower point exists, and the local minimum is global.

Minimizing a convex function over a convex feasible set:

This is why practitioners fight so hard to cast a problem as convex: a solver can then stop the instant it can't go downhill, certain it has found the true best answer — no ridge to peer over.

The defining inequality, extended to many points and weights that sum to one, is Jensen's inequality: f\!\left(\sum_i \lambda_i x_i\right) \le \sum_i \lambda_i f(x_i) \qquad (\lambda_i \ge 0,\ \textstyle\sum_i \lambda_i = 1), or in probability language, f(\mathbb{E}[X]) \le \mathbb{E}[f(X)] for convex f. It quietly runs through all of mathematics: it proves the arithmetic mean beats the geometric mean, that entropy is maximised by the uniform distribution, and that a risk-averse gambler (whose utility is concave) prefers a sure \$50 to a coin-flip between \$0 and \$100 — the same expected value, but \mathbb{E}[u(X)] \le u(\mathbb{E}[X]) makes the gamble worth less. Convexity, it turns out, is the mathematics of "don't put all your eggs in one basket".