Convex Sets and Functions
Optimization is hard for
one reason above all others: a landscape can hide many local minima, and standing in one you cannot
see whether a deeper valley lies beyond the ridge. Convexity is the magic property
that makes this fear vanish. In a convex problem there is only one valley — so any minimum
you find is the minimum. Half of optimization is really about recognising, or engineering,
convexity.
Two objects wear the name. A convex set is a region with no dents: pick any two
points in it and the straight segment joining them stays entirely inside.
A set C \subseteq \mathbb{R}^n is convex if for all
\mathbf{x}, \mathbf{y} \in C and every
\lambda \in [0, 1],
\lambda\mathbf{x} + (1-\lambda)\mathbf{y} \in C.
The expression \lambda\mathbf{x} + (1-\lambda)\mathbf{y} is a
convex combination — it sweeps out the whole line segment from
\mathbf{y} (at \lambda = 0) to
\mathbf{x} (at \lambda = 1).
A disk, a filled triangle, a half-plane, a line, all of \mathbb{R}^n — all
convex. A crescent moon, a doughnut, an L-shape, a pair of separate blobs — not convex, because some
chord pokes outside. Crucially, the intersection of convex sets is convex, which is
why a feasible region cut out by many linear "\le" constraints (each a
convex half-space) is always a convex polytope.
Convex functions: the chord lies above
A convex function curves upward like a bowl: the straight chord joining any two
points of its graph never dips below the graph between them.
f is convex on a convex domain if for all
x, y and \lambda \in [0, 1],
f\bigl(\lambda x + (1-\lambda) y\bigr) \le \lambda f(x) + (1-\lambda) f(y).
The left side is the function on the chord's midpoint parameter; the right side is the chord itself.
If the inequality is always strict for x \ne y and
0 < \lambda < 1, f is strictly
convex. Flip the inequality (\ge) and you have a
concave function — a dome, the negative of a convex one.
Drag the slider to slide a probe point along the interval, and switch between a convex bowl and a
wiggly non-convex curve. For the bowl the chord (its colour) stays above the graph
everywhere — the gap is never negative. For the wiggle you can park the probe where the chord cuts
below the curve: a single such spot is enough to disqualify convexity.
The second-derivative test
Checking every chord is impractical. For a twice-differentiable function there is a pointwise test:
curvature must never turn downward.
A twice-differentiable f is convex on an interval iff
f''(x) \ge 0 there (curvature never negative). In several variables, replace
f'' by the Hessian
\nabla^2 f: f is convex iff the Hessian is
positive
semidefinite everywhere on the domain.
Examples. f(x) = x^2 has
f'' = 2 > 0 — strictly convex. e^x has
f'' = e^x > 0 — convex. -\ln x has
f'' = 1/x^2 > 0 — convex on x > 0. But
x^3 has f'' = 6x, which is negative for
x < 0: convex only on [0, \infty), not on the
whole line.
A convex function has a wonderfully useful cousin: its epigraph — the set of all
points on or above the graph, \{(x, t) : t \ge f(x)\} — is a
convex set. So "convex function" and "convex set" are two faces of one idea: a function is
convex exactly when the region above it is.
Why convexity guarantees a global optimum
Here is the payoff, and the short proof is worth savouring. Suppose f is
convex and x^\star is a local minimum. Could a strictly lower
point y exist somewhere far off, with
f(y) < f(x^\star)? Walk from x^\star toward
y along the segment. Convexity says every point on that segment satisfies
f\bigl(\lambda x^\star + (1-\lambda) y\bigr) \le \lambda f(x^\star) + (1-\lambda) f(y) < f(x^\star)
for every \lambda \in [0, 1) — so points arbitrarily close to
x^\star already beat it. That contradicts
x^\star being a local minimum. Hence no lower point exists, and the local
minimum is global.
Minimizing a convex function over a convex feasible set:
- every local minimum is a global minimum;
- the set of minimizers is itself convex (no isolated rival optima);
- if f is strictly convex, the minimizer is
unique.
This is why practitioners fight so hard to cast a problem as convex: a solver can then stop the
instant it can't go downhill, certain it has found the true best answer — no ridge to peer over.
The defining inequality, extended to many points and weights that sum to one, is
Jensen's inequality:
f\!\left(\sum_i \lambda_i x_i\right) \le \sum_i \lambda_i f(x_i) \qquad (\lambda_i \ge 0,\ \textstyle\sum_i \lambda_i = 1),
or in probability language, f(\mathbb{E}[X]) \le \mathbb{E}[f(X)] for
convex f. It quietly runs through all of mathematics: it proves the
arithmetic mean beats the geometric mean, that entropy is maximised by the uniform distribution, and
that a risk-averse gambler (whose utility is concave) prefers a sure \$50 to a coin-flip between \$0
and \$100 — the same expected value, but \mathbb{E}[u(X)] \le u(\mathbb{E}[X])
makes the gamble worth less. Convexity, it turns out, is the mathematics of "don't put all your eggs
in one basket".
-
Convex set ≠ convex function. They are different objects. The link is the
epigraph: a function is convex iff the region above its graph is a convex set. Don't ask
whether a set is a "convex function" or vice versa.
-
A concave function is not "non-convex". Concave means bowl-down
(f'' \le 0); its negative is convex. "Non-convex" is the broad category
of everything that is neither — wiggly functions with several dips. Maximizing a concave function is
an easy (convex) problem; minimizing one is generally hard.
-
Convexity is a property of a function and its domain.
x^3 is convex on [0, \infty) but not on all of
\mathbb{R}, because f'' = 6x changes sign. Always
state where.