Lagrange Multipliers
Unconstrained optimisation
sets the whole gradient to zero. But the interesting problems come fenced in: maximise
output subject to a fixed budget, minimise surface area at a fixed volume. We want to
optimise f(x, y) while a constraint holds,
g(x, y) = c.
Now \nabla f = \mathbf{0} is the wrong condition: the best point
on the constraint curve is rarely a free peak of f. The right
condition is startlingly clean. At a constrained optimum the two gradients are
parallel:
\nabla f = \lambda\, \nabla g,
for some scalar \lambda, the Lagrange
multiplier. One little equation captures the whole geometry.
Deriving the condition geometrically
The argument rests entirely on what
the gradient
means — steepest ascent, and perpendicular to level curves.
Step 1 — you are confined to the constraint curve. Every allowed point
satisfies g = c, so you may only travel along the curve
g(x, y) = c. Let \mathbf{T} be its
tangent direction at a candidate point.
Step 2 — at a constrained max, moving can't increase f.
If stepping along the curve in direction \mathbf{T} raised
f even slightly, the point wouldn't be the maximum — you'd just
take that step. So the rate of change of f along the curve must
be zero. That rate is a
directional derivative:
D_{\mathbf{T}} f = \nabla f \cdot \mathbf{T} = 0.
Step 3 — so \nabla f has no component along the
constraint. A zero dot product means \nabla f is
perpendicular to the tangent \mathbf{T} of the
constraint curve.
Step 4 — but \nabla g is also perpendicular to that
tangent. The constraint curve is itself a level curve of
g (the level g = c), and a gradient is
always perpendicular to its own level curve:
\nabla g \cdot \mathbf{T} = 0.
Step 5 — two vectors perpendicular to the same line are parallel. In the
plane there is only one direction perpendicular to \mathbf{T}
(up to sign). Both \nabla f and
\nabla g point along it, so they are scalar multiples of each
other:
\nabla f = \lambda\, \nabla g.
Geometrically: at the optimum the level curve of f is
tangent to the constraint curve. The contours of
f are sliding across the constraint, and the last one they touch
before leaving is the one that just kisses it — sharing a tangent line, hence parallel
normals.
To find the extrema of f(x, y) subject to
g(x, y) = c (with \nabla g \neq \mathbf{0}
on the constraint), solve the system:
-
Parallel gradients:
\nabla f = \lambda\, \nabla g, i.e.
f_x = \lambda g_x and
f_y = \lambda g_y.
-
The constraint itself: g(x, y) = c.
-
That is three equations in the three unknowns
x, y,
\lambda; solve for the candidate points.
-
Evaluate f at every solution; the largest value is the
constrained maximum, the smallest the constrained minimum.
A worked example
Maximise f(x, y) = xy subject to the constraint
x + y = 10. (The largest-area rectangle of a fixed perimeter, in
disguise.)
Step 1 — the two gradients. With
g(x, y) = x + y,
\nabla f = (y,\, x), \qquad \nabla g = (1,\, 1).
Step 2 — set them parallel.
\nabla f = \lambda \nabla g gives two equations:
y = \lambda \cdot 1, \qquad x = \lambda \cdot 1.
Step 3 — eliminate \lambda. Both equal
\lambda, so x = y.
Step 4 — use the constraint. Substitute
x = y into x + y = 10:
x + x = 10 \;\Rightarrow\; x = 5, \quad y = 5.
Step 5 — the optimal value.
f(5, 5) = 5 \cdot 5 = 25, with multiplier
\lambda = 5. Among all ways to split
10 into two parts, the product is largest when the parts are
equal — the constrained maximum is 25.
The multiplier \lambda looks like algebraic scaffolding to be
discarded, but it carries real meaning. It measures how much the optimal value of
f would change if you relaxed the constraint by one
unit. Writing the optimum as a function of the constraint level
c,
\frac{d f^{*}}{d c} = \lambda.
In our example \lambda = 5: nudging the budget from
x + y = 10 to 11 would lift the
maximum product by about 5 (indeed the new optimum is
5.5^2 = 30.25, up 5.25 — and
5 is the first-order estimate). To an economist this is a
shadow price: the marginal value of one more unit of the scarce resource.
It tells a firm exactly how much it should be willing to pay to loosen a binding
constraint — the sensitivity of the prize to the fence.
Slide to the tangency
The straight line is the constraint x + y = 10; the curved arcs
are level curves of f = xy (hyperbolae, one per value of the
product). Slide the point along the constraint. Watch the contour through the point: almost
everywhere it cuts across the line, meaning you could slide to a higher contour. At
x = y = 5 the contour is tangent — its gradient
\nabla f = (y, x) lines up with
\nabla g = (1, 1) — and the product
f = xy peaks at 25.